Hardware: AMD Radeon RX 9070 XT (16GB VRAM) OS: Ubuntu 22.04 / Linux ROCm: 7.0 / 7.1 Preview Goal: Stable Text-to-Video generation with Wan 2.1 (14B) without crashing or OOM.
Running Wan 2.1 on RDNA 4 currently causes frequent HIP error: illegal memory access crashes or immediate OOMs during VAE decoding. This is due to kernel conflicts with PyTorch TunableOp and memory fragmentation.
Save this as run_wan_safe.sh. The specific environment variables are critical.
#!/bin/bash
# 1. DISABLE System Direct Memory Access (SDMA)
# Prevents data corruption during heavy GGUF transfers on RDNA 4.
export HSA_ENABLE_SDMA=0
# 2. DISABLE PyTorch TunableOp
# Crucial. While TunableOp helps Flux, it causes "Illegal Memory Access"
# crashes with Wan 2.1 kernels on Navi 4x.
export PYTORCH_TUNABLEOP_ENABLED=0
# 3. ENABLE Triton Backend for Flash Attention
# The default Composable Kernel (CK) backend often fails on RDNA 4.
# Requires flash-attn to be built with this var set.
export FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE"
# 4. Aggressive Memory Fragmentation Control
# Forces PyTorch to split blocks earlier (128MB) and GC sooner (60%).
export PYTORCH_HIP_ALLOC_CONF="garbage_collection_threshold:0.6,max_split_size_mb:128,expandable_segments:True"
echo "Launch Config:"
echo " SDMA: OFF (Stability)"
echo " TunableOp: OFF (Fix Illegal Access)"
echo " Triton FA: ON (Performance)"
echo " HIP Alloc: Optimized for 16GB"
# Launch ComfyUI with Low VRAM mode to force aggressive offloading
python3 main.py --lowvram --use-split-cross-attentionEven with the script, you will OOM during the final VAE Decode step unless you use these settings:
- Node: Use
VAEDecodeTiled(Not standard VAEDecode). - Tile Size:
256(Default 512 is too large for 16GB VRAM + 14B Model). - Temporal Tiling:
16(Helps smooth out the decoding). - Overlap:
64.
You must build flash-attention from source with the Triton flag enabled:
export FLASH_ATTENTION_TRITON_AMD_ENABLE="TRUE"
pip install git+https://github.com/Dao-AILab/flash-attention.git
In this guide you install flash-attention like this:
in your flash-attention guide like this:
what is the preferred way to do it?