System Failure and Recovery
This section describes alerts for and appropriate responses to various kinds of system failures. It also helps you plan a strategy for data recovery.
Handling Forced Cache Disconnection Using Autoreconnect
A GemFire member may be forcibly disconnected from a GemFire distributed system if the member is unresponsive for a period of time, or if a network partition separates one or more members into a group that is too small to act as the distributed system.
Recovering from Machine Crashes
When a machine crashes because of a shutdown, power loss, hardware failure, or operating system failure, all of its applications and cache servers and their local caches are lost.
Preventing and Recovering from Disk Full Errors
It is important to monitor the disk usage of GemFire members. If a member lacks sufficient disk space for a disk store, the member attempts to shut down the disk store and its associated cache, and logs an error message. A shutdown due to a member running out of disk space can cause loss of data, data file corruption, log file corruption and other error conditions that can negatively impact your applications.