Skip to content

Instantly share code, notes, and snippets.

@cordt-sei
Last active October 25, 2024 20:27
Show Gist options
  • Save cordt-sei/1d75bff1c6cae97b1840f46a60d902e8 to your computer and use it in GitHub Desktop.
Save cordt-sei/1d75bff1c6cae97b1840f46a60d902e8 to your computer and use it in GitHub Desktop.

Node Debugging: Submitting Info for Crash or "AppHash" Error

If you encounter a crash, halt, or "apphash" error on your Sei node and can collect data, please follow these guidelines based on the specific issue.

1. AppHash Error

For app hash mismatches, we must capture the state to compare with a known "good" version:

  • State Dump:
    • Legacy IAVL DB:
      seid debug dump-iavl <latest height>
      
    • SeiDB (most non-archive nodes):
      Follow these steps to perform a state dump:
      git clone https://github.com/sei-protocol/sei-db.git
      cd sei-db/tools
      make install
      systemctl stop seid
      seidb dump-iavl -d ~/.sei/data/committer.db -o /home/ubuntu/iavl-dump
      systemctl restart seid
  • State Hashes:
    Include the app hash, commit hash, and block height from logs.

2. Panics, Crashes, or Nil Pointer Exceptions

For most other incidents/issues:

  • Stack Trace or Error Logs:
    Capture the full stack trace or error logs at the time of the crash.
    • Provide at least 1,000 lines of logs leading up to the crash or 15 minutes of log data, whichever is more useful.

Submission

  • Logs or files under 10 MB: Upload to GitHub Gist and share the link.
  • Larger files (e.g., state dumps): Contact the support team for alternative upload methods.

Thank you for your efforts to assist with debugging! Your contributions help speed up issue resolution.

Better Logging Practices

To improve logging and ensure the most detailed post-incident reporting, as is practicable:

1. Enable Detailed Logging

  • Set a detailed log level in config.toml:
    log_level = "debug"  # or "trace" for maximum detail
  • Choose the log format based on your needs:
    • Plain Text: Easier to read manually.
      log_format = "plain"
    • JSON Format: More suitable for structured log analysis and integration with log management tools.
      log_format = "json"

2. Log Rotation and Retention

  • Use a log rotation tool like logrotate.

  • Rotate logs daily or when exceeding a specified size (e.g., 100 MB).

  • Retain 7-14 days of logs.

  • Considerations:

    • Risk of losing older logs

3. Post-Incident Procedures

  • Enable core dumps for crashes:

    ulimit -c unlimited
  • Configure core dump location:

    echo "/tmp/core.%e.%p" > /proc/sys/kernel/core_pattern
  • Considerations:

    • Security risks if sensitive data is exposed
    • Increased disk usage

4. Automated Alerts and Actions

  • Use monitoring tools like Prometheus.

  • Automate data collection for specific incidents.

  • Considerations:

    • Risk of alert fatigue if alerts are too frequent.
    • Added system complexity with automation scripts.

5. Log Aggregation Tools

  • Use log aggregation tools like Fluentd, Logstash, or Graylog for centralized logging.

  • Considerations:

    • Additional system load
    • Maintenance required
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment