Conda Environment Strategy for AMD ROCm GPUs

Managing conda environments for AI/ML workloads on AMD GPUs can quickly consume 100GB+ of disk space. The PyTorch ROCm stack alone is ~18GB, and each new environment duplicates this.

The Problem

Creating environments independently:

# BAD: Each env duplicates the full PyTorch ROCm stack (~18GB each)
conda create -n project1 python=3.12
conda create -n project2 python=3.12
# Result: 36GB+ just for two empty-ish environments

The Solution: Clone from Base

Create ONE well-tested base environment with PyTorch ROCm, then clone it:

# 1. Create base environment once
conda create -n pytorch-rocm-base python=3.12
conda activate pytorch-rocm-base
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2

# 2. Clone for new projects (uses hard links - no duplication!)
conda create --clone pytorch-rocm-base --name my-project

# 3. Add project-specific packages
conda activate my-project
pip install whisper transformers  # Only these get added

Disk Savings

Approach	5 Environments	Disk Usage
Independent	5 × 18GB base	~90GB
Cloned	1 × 18GB + extras	~25GB

Helper Script

#!/bin/bash
# create-env.sh - Create environment from base

ENV_NAME="$1"
REQUIREMENTS="$2"

conda create --clone pytorch-rocm-base --name "$ENV_NAME" -y
conda activate "$ENV_NAME"

if [ -f "$REQUIREMENTS" ]; then
    pip install -r "$REQUIREMENTS"
fi

echo "Environment '$ENV_NAME' ready!"

ROCm Notes

For AMD RX 7000 series (gfx1101/Navi 32):

# May need GFX version override
export HSA_OVERRIDE_GFX_VERSION=11.0.1

# Verify GPU detection
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"

Maintenance

# Check disk usage
du -sh ~/miniconda3/envs/*

# Clean package cache
conda clean --all

# Remove unused environments
conda env remove -n old-project

Category-Based Requirements

Organize requirements.txt files by use case:

envs/
├── stt/requirements.txt          # Whisper, WhisperX
├── tts/requirements.txt          # Chatterbox, Bark
├── image-gen/requirements.txt    # diffusers, ComfyUI deps
├── llm/requirements.txt          # transformers, langchain
└── data/requirements.txt         # pandas, plotly

Then create targeted environments:

conda create --clone pytorch-rocm-base --name whisper-work
conda activate whisper-work
pip install -r envs/stt/requirements.txt

This approach saved me ~40GB while maintaining separate, reproducible environments for different AI/ML workflows.

This gist was generated by Claude Code. Please verify any information before relying on it.

danielrosehill/conda-rocm-strategy.md

Select an option

No results found