When training ML models, you can observe that Linux out-of-memory (OOM) killer terminates the process in the middle of training, even on high-RAM machines (e.g., 64 GiB RAM). This can be caused by the system running out of RAM.
To solve this, consider enabling ZRAM, a Linux kernel feature that provides compressed swap space in RAM, giving fast, efficient memory overflow handling without relying on slow disk-based swap.
Install ZRAM tools
sudo apt update
sudo apt install zram-tools
Configure ZRAM
sudo nano /etc/default/zramswap
Add the following:
ENABLED=true
ALGO=zstd
PERCENT=90
PRIORITY=100
PERCENT=90 uses up to 90% of total RAM (~58 GiB on a 64 GiB machine) as compressed swap. Sometimes 50% is enpugh. ZRAM swap is used only when classic RAM is exhausted.
Start and enable the ZRAM service:
sudo systemctl enable zramswap
sudo systemctl start zramswap
Restart if needed (e.g., after changing config):
sudo systemctl restart zramswap
Check ZRAM is active:
swapon --show
OR
sudo zramctl
During training your model you can monitor real-time memory and swap usage:
watch -n 1 free -h