Use ZRAM to prevent OOM kills

ZRAM to prevent OOM kills

When training ML models, you can observe that Linux out-of-memory (OOM) killer terminates the process in the middle of training, even on high-RAM machines (e.g., 64 GiB RAM). This can be caused by the system running out of RAM.

To solve this, consider enabling ZRAM, a Linux kernel feature that provides compressed swap space in RAM, giving fast, efficient memory overflow handling without relying on slow disk-based swap.

Usage

Install ZRAM tools

sudo apt update
sudo apt install zram-tools

Configure ZRAM

sudo nano /etc/default/zramswap

Add the following:

ENABLED=true
ALGO=zstd
PERCENT=90
PRIORITY=100

PERCENT=90 uses up to 90% of total RAM (~58 GiB on a 64 GiB machine) as compressed swap. Sometimes 50% is enpugh. ZRAM swap is used only when classic RAM is exhausted.

Start and enable the ZRAM service:

sudo systemctl enable zramswap
sudo systemctl start zramswap

Restart if needed (e.g., after changing config):

sudo systemctl restart zramswap

Check ZRAM is active:

swapon --show

sudo zramctl

During training your model you can monitor real-time memory and swap usage:

watch -n 1 free -h

PlushZ/notes.md

ZRAM to prevent OOM kills

Usage