Docker Layering For PyTorch ROCm In Action

When building multiple AI/ML Docker images that all need PyTorch with ROCm (AMD GPU) support, naive approaches can waste tens of gigabytes of disk space. This guide shows how Docker's layer sharing works and how to verify your images are efficiently layered.

The Problem: Misleading Image Sizes

Running docker images shows seemingly massive duplication:

REPOSITORY          TAG       SIZE
whisperx-rocm       latest    30.7GB
whisper-rocm        latest    29.7GB
rocm/pytorch        latest    29.3GB

At first glance, this looks like ~90GB of disk usage. But is it really?

The Reality: Shared Layers

These images are actually layered on top of each other:

rocm/pytorch (29.3GB base)
  └── whisper-rocm (+471MB unique)
      └── whisperx-rocm (+998MB unique)

Actual disk usage: ~31GB (not 90GB!)

How to Check Real Disk Usage

Command 1: `docker system df -v`

This shows the SHARED SIZE vs UNIQUE SIZE for each image:

docker system df -v

Output:

REPOSITORY          TAG       SIZE      SHARED SIZE   UNIQUE SIZE
whisperx-rocm       latest    30.7GB    29.73GB       997.9MB
whisper-rocm        latest    29.7GB    29.73GB       0B
rocm/pytorch        latest    29.3GB    29.26GB       0B

Key insight: whisperx-rocm shows 30.7GB total, but only 997.9MB is unique - the rest is shared with parent layers.

Command 2: `docker history`

See how an image was built and what each layer adds:

docker history whisper-rocm:latest --no-trunc | head -15

This reveals the base image and what packages were added on top.

Command 3: Total disk summary

docker system df

Shows aggregate disk usage across all images, containers, and volumes.

Building Efficient Layered Images

Pattern: Use a Common Base Image

Instead of each AI tool installing its own PyTorch + ROCm stack, layer them:

Bad (standalone images):

# whisper/Dockerfile - 30GB standalone
FROM ubuntu:22.04
RUN install-rocm-from-scratch
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
RUN pip install openai-whisper flask gunicorn

# chatterbox/Dockerfile - another 25GB standalone  
FROM ubuntu:22.04
RUN install-rocm-from-scratch
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
RUN pip install chatterbox-tts fastapi uvicorn

Good (layered on common base):

# whisper/Dockerfile - only adds ~500MB
FROM rocm/pytorch:latest

RUN apt-get update && apt-get install -y ffmpeg git \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --no-cache-dir \
    openai-whisper flask flask-cors gunicorn

# chatterbox/Dockerfile - only adds ~500MB
FROM rocm/pytorch:latest

RUN apt-get update && apt-get install -y ffmpeg \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --no-cache-dir \
    chatterbox-tts fastapi uvicorn

Even Better: Chain Your Images

If Whisper and WhisperX share dependencies, chain them:

# whisperx/Dockerfile - builds on whisper, adds ~1GB
FROM whisper-rocm:latest

RUN pip install whisperx

Real-World Example: AI Audio Stack

Here's an efficient stack for audio AI on AMD GPUs:

Image	Total Size	Unique Size	Purpose
`rocm/pytorch:latest`	29.3GB	29.3GB	Base (PyTorch + ROCm)
`whisper-rocm`	29.7GB	471MB	Speech-to-text API
`whisperx-rocm`	30.7GB	998MB	Enhanced STT with alignment
`chatterbox-tts`	~30GB	~500MB	Text-to-speech with voice cloning

Total apparent size: 120GB
Actual disk usage: ~31GB
Space saved: ~89GB (74% reduction)

Key Takeaways

Always check docker system df -v - image sizes are misleading
Use official base images like rocm/pytorch instead of building from scratch
Chain related images - if B needs everything A has, build B FROM A
Order Dockerfile commands wisely - put rarely-changing layers first
Clean up dangling images with docker image prune

Useful Commands Reference

# Check actual disk usage with layer sharing info
docker system df -v

# See image layer history
docker history IMAGE_NAME --no-trunc

# Clean up dangling (untagged) images
docker image prune -f

# Clean up everything unused (careful!)
docker system prune -a

# See what base image was used
docker inspect IMAGE_NAME | jq '.[0].Config.Image'

Hardware Context

This guide was developed on:

GPU: AMD Radeon RX 7700 XT (gfx1101, Navi 32)
ROCm: 6.x / 7.x
Base Image: rocm/pytorch:latest

The same principles apply to NVIDIA setups with nvidia/cuda or pytorch/pytorch base images.

This gist was generated by Claude Code. Please verify any information before relying on it.

danielrosehill/docker-pytorch-rocm-layering.md

Select an option

No results found

Select an option

No results found

Docker Layering For PyTorch ROCm In Action

The Problem: Misleading Image Sizes

The Reality: Shared Layers

How to Check Real Disk Usage

Command 1: `docker system df -v`

Command 2: `docker history`

Command 3: Total disk summary

Building Efficient Layered Images

Pattern: Use a Common Base Image

Even Better: Chain Your Images

Real-World Example: AI Audio Stack

Key Takeaways

Useful Commands Reference

Hardware Context

danielrosehill/docker-pytorch-rocm-layering.md

Docker Layering For PyTorch ROCm In Action

The Problem: Misleading Image Sizes

The Reality: Shared Layers

How to Check Real Disk Usage

Command 1: docker system df -v

Command 2: docker history

Command 3: Total disk summary

Building Efficient Layered Images

Pattern: Use a Common Base Image

Even Better: Chain Your Images

Real-World Example: AI Audio Stack

Key Takeaways

Useful Commands Reference

Hardware Context

Command 1: `docker system df -v`

Command 2: `docker history`