When there is memory usage growth in the sui-node process (validator or fullnode), usually this is hard to reproduce and not unobserved on many other instances (otherwise the issue would have been fixed before the release). More infromation are usually needed to help us debug the issue.
First, a few metrics can be checked to see if a particular stream is very large and keeps growing.
- rocksdb_estimate_table_readers_mem
- rocksdb_block_cache_usage
- rocksdb_block_cache_pinned_usage
- rocksdb_block_cache_capacity
- rocksdb_num_active_db_handles
- transaction_manager_num_pending_certificates
- transaction_manager_num_executing_certificates
- monitored_futures
- monitored_tasks
This should be easy if Prometheus / Grafana has been set up. Attaching screenshots for the growing metrics would be very helpful for investigating the issue.
Otherwise, running curl 127.0.0.1:9184/metrics 2>/dev/null | grep ... on the host should return the current values. But this does not show the trend and not as useful for investigations.
Next, if the root cause cannot be identified from metrics, sui-node memory profiles needs to be collected.
- Install jemalloc, jeprof and graphviz. On Ubuntu, the command is
sudo apt install libjemalloc-dev graphviz. - Run
sui-nodewithjemallocas the memory allocator and export its memory profiles, via environment variables. On Ubuntu, this can be specified:LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so MALLOC_CONF=prof:true,prof_prefix:/opt/sui/jeprof/jeprof.out,lg_prof_interval:36- Even if not collecting memory profiles, using jemalloc is recommended.
- Increasing and decreasing profile dump frequency can be achieved by decreasing and increasing
lg_prof_intervalrespectively. - The directory specified for
prof_prefix, for example/opt/sui/jeprof/, must be created with permissions allowing writes from thesui-nodeprocess.
- After
sui-noderuns for a while, the latest memory profiles can be found vials -1t /opt/sui/jeprof/ | head -20 - A memory profile selected above can be visualized with
sudo jeprof --svg --lines /opt/sui/bin/sui-node /opt/sui/jeprof/<file name selected above> > jeprof.svg
A visualized heap profile will significantly help with locating the issue.