Overview of Multi-site Caching
Overview of Multi-site Caching
A multi-site installation consists of two or more distributed systems that are loosely coupled. Each site manages its own distributed system, but region data is distributed to remote sites using one or more logical connections.
The logical connections consist of a gateway sender in the sending site, and a gateway receiver in the receiving site. In a client/server installation, gateway senders and gateway receivers are configured in the server layer.
Gateway senders and receivers are defined at startup in the distributed system member caches. A site can use serial and/or parallel gateway sender configurations, as described in Gateway Senders.
Consistency for WAN Updates
GemFire ensures that all copies of a region eventually reach a consistent state on all members and clients that host the region, including GemFire members that distribute region events across a WAN.
By default, potential WAN conflicts are resolved using a timestamp mechanism. You can optionally install a custom conflict resolver to apply custom logic when determining whether to apply a potentially conflicting update received over a WAN.
Consistency for Region Updates describes how GemFire ensures consistency within a cluster, in client caches, and when applying updates over a WAN. Resolving Conflicting Events provides more details about implementing a custom conflict resolver for WAN updates.
Discovery for Multi-Site Systems
Each GemFire cluster in a WAN configuration uses locators to discover remote GemFire clusters as well as local GemFire members.
Each locator in a WAN configuration defines a unique distributed-system-id property that identifies the local cluster to which it belongs. A locator uses the remote-locators property to define the addresses of one or more locators in remote GemFire clusters to use for WAN distribution.
When a locator starts up, it contacts each locator that is configured in the remote-locators property to exchange information about the available locators and gateway receivers in the cluster. The locator also shares information about locators and gateway receivers in any other GemFire clusters that have connected to the cluster. Connected clusters can then use the shared gateway receiver information to distribute region events according to their configured gateway senders.
Each time a new locator starts up or an existing locator shuts down, the changed information is broadcast to other connected GemFire clusters.
A GemFire cluster uses a gateway sender to distribute region events to another, remote GemFire cluster. You can create multiple gateway sender configurations to distribute region events to multiple remote clusters, and/or to distribute region events concurrently to another remote cluster.
A gateway sender always communicates with a gateway receiver in a remote cluster. Gateway senders do not communicate directly with other cache server instances. See Gateway Receivers.
GemFire provides two types of gateway sender configurations: serial gateway senders and parallel gateway senders.
Serial Gateway Senders
Because a serial gateway sender distributes all of a region's events from a single GemFire member, it provides the most control over ordering region events as they are distributed across the WAN. However, a serial gateway sender provides only a finite amount of throughput for distributing events. As you add more regions and servers to the local cluster, you may need to configure additional serial gateway senders manually and isolate individual regions on specific serial gateway senders to handle the increased distribution traffic.
Parallel Gateway Senders
A parallel gateway sender distributes region events simultaneously from each GemFire server that hosts the region. For a partitioned region, this means that each GemFire server that hosts buckets for the region uses its own logical queue to distribute events for those buckets. As you add new GemFire servers to scale the partitioned region, WAN distribution throughput scales automatically with each new instance of the parallel gateway sender.
A parallel gateway sender uses multiple queues on multiple GemFire members to simultaneously distribute region events to a remote GemFire cluster. Each queue distributes only part of the events for a configured region. For example, for a partitioned region, each queue distributes only those events for the local partition.
Distributed, non-replicated regions can also use a parallel gateway sender for distribution. With a distributed region, GemFire creates a separate gateway sender and queue on each member that hosts the region, and then hashes region events to place them into a distinct queue. This provides a level of scalability for distributed regions that is similar to that provided for partitioned regions.
Although parallel gateway senders provide the best throughput for WAN distribution, they provide less control for ordering events. With a parallel gateway sender, you cannot preserve event ordering for the region as a whole because multiple GemFire servers distribute the regions events at the same time. However, the ordering of events for a given partition (or in a given queue of a distributed region) can be preserved. See Configuring Multi-Site (WAN) Event Queues.
Gateway Sender Queues
The queue that a gateway sender uses to distribute events to a remote site overflows to disk as needed, in order to prevent the GemFire member from running out of memory. You can configure the maximum amount of memory that each queue uses, as well as the batch size and frequency for processing batches in the queue. You can also configure these queues to persist to disk, so that a gateway sender can pick up where it left off when its member shuts down and is later restarted.
By default gateway sender queues use 5 threads to dispatch queued events. With a serial gateway sender, the single, logical queue that is hosted on a GemFire member is divided into multiple physical queues (5 by default) each with a dedicated dispatcher thread. You can configure whether the threads dispatch queued events by key, by thread, or in the same order in which events were added to the queue. For a parallel gateway sender, each logical queue that is hosted on a GemFire member is processed simultaneously by multiple threads.
High Availability for Gateway Senders
When a serial gateway sender configuration is deployed to multiple GemFire members, only one "primary" sender is active at a given time. All other serial gateway sender instances are inactive "secondaries" that are available as backups if the primary sender shuts down. GemFire designates the first gateway sender to start up as the primary sender, and all other senders become secondaries. As gateway senders start and shut down in the distributed system, GemFire ensures that the oldest running gateway sender operates as the primary.
A parallel gateway sender is deployed to multiple GemFire members by default, and each GemFire member that hosts primary buckets for a partitioned region actively distributes data to the remote GemFire site. When you use parallel gateway senders, high availability for WAN distribution is provided if you configure the partitioned region for redundancy. With a redundant partitioned region, if a member that hosts primary buckets fails or is shut down, then a GemFire member that hosts a redundant copy of those buckets takes over WAN distribution for those buckets.
Stopping Gateway Senders
The scope of the gateway sender stop operation is the VM on which it is invoked. When you stop a parallel gateway sender using the GatewaySender.stop() or gfsh stop gateway-sender, the gateway sender is stopped on the individual node where this API is called. If the gateway sender is not parallel (serial), then the gateway sender will stop on the local VM, and the secondary gateway sender will become primary and start dispatching events. The gateway sender will wait for GatewaySender.MAXIMUM_SHUTDOWN_WAIT_TIME seconds before stopping itself (by default, this value is set to 0.) You can set this Java system property when starting the server member in gfsh. If the Java system property is set to -1, then the gateway sender process will wait until all events are dispatched from the queue before stopping.
The API and gfsh command stops the parallel gateway sender in one member, which causes data loss because events to buckets in that member will be dropped by the stopped sender. The partitioned region does not failover in this scenario since the member is still running. Instead, to ensure that the remaining events are sent, shut down the entire member to ensure proper failover of partition region events. When a member with the stopped parallel sender is shut down, the other parallel gateway sender members hosting the partition region become primary and deliver the remaining events. In addition, if the whole cluster is brought down after stopping an individual parallel gateway sender, then events queued on that gateway sender can be lost.
Pausing Gateway Senders
Similar to stopping a gateway sender, the scope of pausing a gateway sender is the VM on which it is invoked. Pausing a gateway sender temporarily stops the dispatching of events from the underlying queue. Note that events are still queued into the queue. In case where the gateway sender is parallel, the gateway sender is paused on the individual node where the GatewaySender.pause() API is called or the gfsh pause gateway-sender command is invoked. The parallel gateway senders on other members can still dispatch events. In case where the paused gateway sender is not parallel (serial) and is not primary, then the primary gateway sender will still continue dispatching events. The batch of events that are in the process of being dispatched are dispatched regardless of the state of the pause operation. We can expect a maximum of one batch of events being received at the gateway receiver even after the gateway senders have been paused.
A gateway receiver configures a physical connection for receiving region events from gateway senders in one or more remote GemFire clusters.
A gateway receiver applies each region event to the same region or partition that is hosted in the local GemFire member. (An exception is thrown if the receiver receives an event for a region that it does not define.)
Gateway senders use any available gateway receiver in the target cluster to send region events. You can deploy gateway receiver configurations to multiple GemFire members as needed for high availability and load balancing, however you can only host one gateway receiver per member.
After you create a gateway receiver, you can configure the gateway receiver to start automatically or to require a manual start. The current default is to require a manual start for the gateway receiver (manual-start is set to true).
After you create and start a new gateway receiver at one WAN site, you can execute the load-balance gateway-sender command in gfsh for existing remote gateway senders so that the new receiver can pick up connections to gateway senders at different sites. You invoke this command on the gateway senders to redistribute connections more evenly among all the gateway receivers. Another option is to use the GatewaySender.rebalance Java API.