Skip to content

Instantly share code, notes, and snippets.

@mwtian
Last active May 13, 2025 06:05
Show Gist options
  • Select an option

  • Save mwtian/0f473325a1ad5a74982fcf91737653b4 to your computer and use it in GitHub Desktop.

Select an option

Save mwtian/0f473325a1ad5a74982fcf91737653b4 to your computer and use it in GitHub Desktop.
Debugging Sui node memory growth

Debugging Sui node memory growth

When there is memory usage growth in the sui-node process (validator or fullnode), usually this is hard to reproduce and not unobserved on many other instances (otherwise the issue would have been fixed before the release). More infromation are usually needed to help us debug the issue.

Metrics

First, a few metrics can be checked to see if a particular stream is very large and keeps growing.

  • rocksdb_estimate_table_readers_mem
  • rocksdb_block_cache_usage
  • rocksdb_block_cache_pinned_usage
  • rocksdb_block_cache_capacity
  • rocksdb_num_active_db_handles
  • transaction_manager_num_pending_certificates
  • transaction_manager_num_executing_certificates
  • monitored_futures
  • monitored_tasks

This should be easy if Prometheus / Grafana has been set up. Attaching screenshots for the growing metrics would be very helpful for investigating the issue.

Otherwise, running curl 127.0.0.1:9184/metrics 2>/dev/null | grep ... on the host should return the current values. But this does not show the trend and not as useful for investigations.

Heap profile

Next, if the root cause cannot be identified from metrics, sui-node memory profiles needs to be collected.

  1. Install jemalloc, jeprof and graphviz. On Ubuntu, the command is sudo apt install libjemalloc-dev graphviz.
  2. Run sui-node with jemalloc as the memory allocator and export its memory profiles, via environment variables. On Ubuntu, this can be specified: LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so MALLOC_CONF=prof:true,prof_prefix:/opt/sui/jeprof/jeprof.out,lg_prof_interval:36
    • Even if not collecting memory profiles, using jemalloc is recommended.
    • Increasing and decreasing profile dump frequency can be achieved by decreasing and increasing lg_prof_interval respectively.
    • The directory specified for prof_prefix, for example /opt/sui/jeprof/, must be created with permissions allowing writes from the sui-node process.
  3. After sui-node runs for a while, the latest memory profiles can be found via ls -1t /opt/sui/jeprof/ | head -20
  4. A memory profile selected above can be visualized with sudo jeprof --svg --lines /opt/sui/bin/sui-node /opt/sui/jeprof/<file name selected above> > jeprof.svg

A visualized heap profile will significantly help with locating the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment