Creating Backups for System Recovery and Operational Management
Creating Backups for System Recovery and Operational Management
A backup is a copy of persisted data from a disk store. A backup is used to restore the disk store to the state it was in when the backup was made. The appropriate back up and restore procedures differ based upon whether the distributed system is online or offline. An online system has currently running members. An offline does not have any running members.
- Making a Backup While the System Is Online
- What a Full Online Backup Saves
- What an Incremental Online Backup Saves
- Disk Store Backup Directory Structure and Contents
- Offline Members: Manual Catch-Up to an Online Backup
- Restore Using a Backup Made While the System Was Online
Making a Backup While the System Is Online
- Consider compacting your disk store before making a backup. If auto-compaction is turned off, you may want to do a manual compaction to save on the quantity of data copied over the network by the backup. For more information on configuring a manual compaction, see Manual Compaction.
- Run the backup during a period of low activity in your system. The backup does not block system activities, but it uses file system resources on all hosts in your distributed system, and it can affect performance.
- Configure each member with any additional files or directories to be backed
up by modifying the member's cache.xml file. Additional
items that ought to be included in the backup:
- application jar files
- other files that the application needs when starting, such as a file that sets the classpath
<backup>./myExtraBackupStuff</backup>Directories are recursively copied, with any disk stores that are found excluded from this user-specified backup.
- Back up to a SAN (recommended) or to a directory that all members can
access. Make sure the directory exists and has the proper permissions for
all members to write to the directory and create subdirectories.
The directory specified for the backup can be used multiple times. Each time a backup is made, a new subdirectory is created within the specified directory, and that new subdirectory's name represents the date and time.You can use one of two locations for the backup:
- a single physical location, such as a network file server, for
- a directory that is local to all host machines in the system,
- a single physical location, such as a network file server, for example:
- Make sure all members with persistent data are running in the system, because offline members cannot back up their disk stores. Output from the backup command will not identify members that are offline.
- If auto-compaction is disabled, and
manual compaction is needed:
gfsh>compact disk-store --name=Disk1
- Run the gfsh backup
disk-store command, specifying the backup directory location.
gfsh>backup disk-store --dir=/export/fileServerDirectory/gemfireBackupLocation
The output will list information for each member that has successfully backed up disk stores. The tabular information will contain the member's name, its UUID, the directory backed up, and the host name of the member.
Any online member that fails to complete its backup will leave a file named INCOMPLETE_BACKUP in its highest level backup directory. The existence of this file identifies that the backup file contains only a partial backup, and it cannot be used in a restore operation.
- Validate the backup for later
recovery use. On the command line, each backup can be checked with commands
cd 2010-04-10-11-35/straw_14871_53406_34322/diskstores/ds1 gfsh validate offline-disk-store --name=ds1 --disk-dirs=/home/dsmith/dir1
How to Do an Incremental Backup
An incremental backup contains items that have changed since a previous backup was made.
gfsh>backup disk-store --dir=/export/fileServerDirectory/gemfireBackupLocation --baseline-dir=/export/fileServerDirectory/gemfireBackupLocation/2012-10-01-12-30
The output will appear the same as the output for a full online backup.
Any online member that fails to complete its incremental backup will leave a file named INCOMPLETE_BACKUP in its highest level backup directory. The existence of this file identifies that the backup file contains only a partial backup, and it cannot be used in a restore operation. The next time a backup is made, a full backup will be made.
What a Full Online Backup Saves
- Disk store files for all members containing persistent region data.
- Files and directories specified in
the cache.xml configuration file as
<backup> elements. For example:
- Deployed JAR files that were deployed using the gfsh deploy command.
- Configuration files from the member
- gemfire.properties, including the properties with which the member was started.
- cache.xml, if used.
- A restore script, called restore.bat on Windows, and called restore.sh on Linux. This script may later be used to do a restore. The script copies files back to their original locations.
What an Incremental Online Backup Saves
An incremental backup saves the difference between the last backup and the current data. An incremental backup copies only operations logs that are not already present in the baseline directories for each member. For incremental backups, the restore script contains explicit references to operation logs in one or more previously chained incremental backups. When the restore script is run from an incremental backup, it also restores the operation logs from previous incremental backups that are part of the backup chain.
If members are missing from the baseline directory because they were offline or did not exist at the time of the baseline backup, those members place full backups of all their files into the incremental backup directory.
Disk Store Backup Directory Structure and Contents
$ cd thebackupdir $ ls -R ./2012-10-18-13-44-53: dasmith_e6410_server1_8623_v1_33892 dasmith_e6410_server2_8940_v2_45565 ./2012-10-18-13-44-53/dasmith_e6410_server1_8623_v1_33892: config diskstores README.txt restore.sh user ./2012-10-18-13-44-53/dasmith_e6410_server1_8623_v1_33892/config: cache.xml ./2012-10-18-13-44-53/dasmith_e6410_server1_8623_v1_33892/diskstores: DEFAULT ./2012-10-18-13-44-53/dasmith_e6410_server1_8623_v1_33892/diskstores/DEFAULT: dir0 ./2012-10-18-13-44-53/dasmith_e6410_server1_8623_v1_33892/diskstores/DEFAULT/dir0: BACKUPDEFAULT_1.crf BACKUPDEFAULT_1.drf BACKUPDEFAULT.if ./2012-10-18-13-44-53/dasmith_e6410_server1_8623_v1_33892/user:
Offline Members: Manual Catch-Up to an Online Backup
If you must have a member offline during an online backup, you can manually back up its disk stores. Bring this member’s files into the online backup framework manually, and create a restore script by hand starting with a copy of another member’s script:
- Duplicate the directory structure of a backed up member for this member.
- Rename directories as needed to reflect this member’s particular backup, including disk store names.
- Clear out all files other than the restore script.
- Copy in this member’s files.
- Modify the restore script to work for this member.
Restore Using a Backup Made While the System Was Online
- Restore your disk stores while cache members are offline and the system is down.
- Look at each of the restore scripts to see where they will place the files and make sure the destination locations are ready. A restore script will refuse to copy over files with the same names.
- Run each restore script on the host where the backup originated.
- Disk store files for all stores containing persistent region data.
- Any files or directories you have configured to be backed up in the cache.xml <backup> elements.