Producing Artifacts for Troubleshooting
Producing Artifacts for Troubleshooting
There are several types of files that are critical for troubleshooting.
GemFire logs and statistics are the two most important artifacts used in troubleshooting. In addition, they are required for GemFire system health verification and performance analysis. For these reasons, logging and statistics should always be enabled, especially in production. Save the following files for troubleshooting purposes:
- Log files. Even at the default logging level, the log contains data that may be important. Save the whole log, not just the stack. For comparison, save log files from before, during, and after the problem occurred.
- Statistics archive files.
- Core files or stack traces.
- For Linux, you can use gdb to extract a stack from a core file.
- Crash dumps.
- For Windows, save the user mode dump files.
Some locations to check for these files:
- C:\ProgramData\Microsoft\Windows\WER\ReportArchive
- C:\ProgramData\Microsoft\Windows\WER\ReportQueue
- C:\Users\UserProfileName\AppData\Local\Microsoft\Windows\WER\ReportArchive
- C:\Users\UserProfileName\AppData\Local\Microsoft\Windows\WER\ReportQueue
When a problem arises that involves more than one process, a network problem is the most likely cause. When you diagnose a problem, create a log file for each member of all the distributed systems involved. If you are running a client/server architecture, create log files for the clients.
- Make sure the host’s clock is synchronized with the other hosts. Use a time synchronization tool such as Network Time Protocol (NTP).
- Enable logging to a file instead of
standard output by editing gemfire.properties to include this
line:
log-file=filename
- Keep the log level at
config to avoid filling up the disk while including
configuration information. Add this line to gemfire.properties:
log-level=config
Note: Running with the log level at fine can impact system performance and fill up your disk. - Enable statistics gathering for the distributed system either by modifying
gemfire.properties:
statistic-sampling-enabled=true statistic-archive-file=StatisticsArchiveFile.gfs
or by using the gfsh alter rutime command:alter runtime --group=myMemberGroup --enable-statistics=true --statistic-archive-file=StatisticsArchiveFile.gfs
Note: Collecting statistics at the default sample rate frequency of 1000 milliseconds does not incur performance overhead. - Run the application again.
- Examine the log files. To get the
clearest picture, merge the files. To find all the errors in the log file,
search for lines that begin with these strings:
[error [severe
For details on merging log files, see the --merge-log argument for the export logscommand.
- Export and analyze the stack traces on the member or member group where the
application is running. Use the gfsh export stack-traces
command. For
example:
gfsh> export stack-traces --file=ApplicationStackTrace.txt --member=member1