Skip to content

Instantly share code, notes, and snippets.

@In-line
Last active November 1, 2024 18:28
Show Gist options
  • Save In-line/c1225f05d5164a4be9b39de68e99ee2b to your computer and use it in GitHub Desktop.
Save In-line/c1225f05d5164a4be9b39de68e99ee2b to your computer and use it in GitHub Desktop.
AMD 7900 XTX Stable Diffusion Web UI docker container (ROCM 5.5_rc4)

To use this container you would first need to install docker and docker-compose.

After that you would need deleted rocm/composable_kernel:ck_ub20.04_rocm5.5_rc4 docker image (alpha build of ROCM). You can download the image, using bittorect magnet link in the files of the gist. After downloading the image load it with docker load --input rocm5.5.tar.gz

Running is as simple as: sudo docker-compose up --build. After that Stable Diffusion Web UI could be accessed in http://127.0.0.1:3000

Many thanks to @wsippel for putting up step by step guide to get it working.

Disclaimer

This is tested on kernel 6.2.10 and mesa 23.0.0

version: "3.9"
services:
sd:
build: .
ports:
- "3000:7860"
volumes:
- ./models:/SD/stable-diffusion-webui/models/
- ./repositories:/SD/stable-diffusion-webui/repositories/
- ./extensions:/SD/stable-diffusion-webui/extensions/
- ./outputs:/SD/stable-diffusion-webui/outputs/
devices:
- '/dev/kfd:/dev/kfd'
- '/dev/dri:/dev/dri'
security_opt:
- seccomp:unconfined
group_add:
- video
FROM rocm/composable_kernel:ck_ub20.04_rocm5.5_rc4
RUN mkdir /SD
# Clone SD
WORKDIR /SD
RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
WORKDIR /SD/stable-diffusion-webui
RUN git reset --hard 22bcc7be428c94e9408f589966c2040187245d81
RUN apt update && apt install python3.8-venv
RUN python3 -m venv venv
# Activate VENV
ENV VIRTUAL_ENV=/SD/stable-diffusion-webui/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
RUN python3 -m pip install --upgrade pip wheel
ENV HIP_VISIBLE_DEVICES=0
ENV PYTORCH_ROCM_ARCH="gfx1100"
ENV CMAKE_PREFIX_PATH=/SD/stable-diffusion-webui/venv/
ENV USE_CUDA=0
# Setup patched folder & compile dependencies
RUN mkdir -p /SD/stable-diffusion-webui/patched
RUN pip install cmake ninja
WORKDIR patched
# Remove old torch and torchvision
RUN pip uninstall -y torch torchvision
# Build pytorch
RUN wget https://github.com/pytorch/pytorch/releases/download/v2.0.0/pytorch-v2.0.0.tar.gz
RUN tar -xzvf pytorch-v2.0.0.tar.gz
WORKDIR /SD/stable-diffusion-webui/patched/pytorch-v2.0.0
RUN pip install -r requirements.txt
RUN pip install mkl mkl-include
RUN python3 tools/amd_build/build_amd.py
RUN python3 setup.py install
# Build vision
WORKDIR /SD/stable-diffusion-webui/patched/
RUN wget https://github.com/pytorch/vision/archive/refs/tags/v0.15.1.tar.gz
RUN tar -xzvf v0.15.1.tar.gz
WORKDIR /SD/stable-diffusion-webui/patched/vision-0.15.1
RUN python3 setup.py install
WORKDIR /SD/stable-diffusion-webui
# Patch requirements.txt to remove torch
RUN sed '/torch/d' requirements.txt
RUN pip install -r requirements.txt
EXPOSE 7860/tcp
# Fix for "detected dubious ownership in repository" by rom1win.
RUN git config --global --add safe.directory '*'
CMD python3 launch.py --listen --disable-safe-unpickle
@evshiron
Copy link

evshiron commented Jul 14, 2023

@GianlucaMattei

Hmmm, the outputs of rocminfo and rocm-smi look good.

Can you remove --skip-torch-cuda-test and launch again? When it fails, try sudo dmesg and see if there are any abnormal logs from amdgpu.

Also, if you have Docker daemon (not Docker Desktop) installed, would you mind running:

docker run -ti --net=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G -e HSA_OVERRIDE_GFX_VERSION=11.0.0 --name rocm5.5-automatic ghcr.io/evshiron/rocm_lab:rocm5.5-automatic

This is a working example weeks ago. If it doesn't work, can you post the logs here?

@GianlucaMattei
Copy link

GianlucaMattei commented Jul 16, 2023

Ok, this is the log from amdgpu you were looking for (I think)
It seems there are a lot of problems ....

sudo dmesg | grep amdgpu

[    1.803496] [drm] amdgpu kernel modesetting enabled.
[    1.803498] [drm] amdgpu version: 6.1.5
[    1.803541] amdgpu: CRAT table not found
[    1.803542] amdgpu: Virtual CRAT table created for CPU
[    1.803547] amdgpu: Topology: Add CPU node
[    1.858931] amdgpu: PeerDirect support was initialized successfully
[    1.858996] amdgpu 0000:03:00.0: enabling device (0006 -> 0007)
[    1.861612] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from VFCT
[    1.861615] amdgpu: ATOM BIOS: 113-D70401XT-P10
[    1.862014] amdgpu 0000:03:00.0: amdgpu: CP RS64 enable
[    1.862331] amdgpu 0000:03:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
[    1.862976] amdgpu 0000:03:00.0: vgaarb: deactivate vga console
[    1.862980] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    1.863034] amdgpu 0000:03:00.0: amdgpu: MEM ECC is not presented.
[    1.863036] amdgpu 0000:03:00.0: amdgpu: SRAM ECC is not presented.
[    1.863112] amdgpu 0000:03:00.0: BAR 2: releasing [mem 0x4010000000-0x40101fffff 64bit pref]
[    1.863116] amdgpu 0000:03:00.0: BAR 0: releasing [mem 0x4000000000-0x400fffffff 64bit pref]
[    1.863161] amdgpu 0000:03:00.0: BAR 0: assigned [mem 0x4800000000-0x4fffffffff 64bit pref]
[    1.863173] amdgpu 0000:03:00.0: BAR 2: assigned [mem 0x4400000000-0x44001fffff 64bit pref]
[    1.863250] amdgpu 0000:03:00.0: amdgpu: VRAM: 20464M 0x0000008000000000 - 0x00000084FEFFFFFF (20464M used)
[    1.863254] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    1.863256] amdgpu 0000:03:00.0: amdgpu: AGP: 267878400M 0x0000008800000000 - 0x0000FFFFFFFFFFFF
[    1.863333] [drm] amdgpu: 20464M of VRAM memory ready
[    1.863335] [drm] amdgpu: 7897M of GTT memory ready.
[    1.864233] amdgpu 0000:03:00.0: amdgpu: Will use PSP to load VCN firmware
[    2.112892] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
[    2.112898] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[    2.112928] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x00000037, smu fw if version = 0x0000003d, smu fw program = 0, smu fw version = 0x004e5800 (78.88.0)
[    2.112934] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
[    2.271382] amdgpu 0000:03:00.0: amdgpu: SMU is initialized successfully!
[    2.477319] amdgpu 0000:03:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[    2.478964] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    2.478983] amdgpu: sdma_bitmap: fff0
[    2.546219] amdgpu: HMM registered 20464MB device memory
[    2.546228] amdgpu: SRAT table not found
[    2.546228] amdgpu: Virtual CRAT table created for GPU
[    2.546347] amdgpu: Topology: Add dGPU node [0x744c:0x1002]
[    2.546349] kfd kfd: amdgpu: added device 1002:744c
[    2.546360] amdgpu 0000:03:00.0: amdgpu: SE 6, SH per SE 2, CU per SH 8, active_cu_number 84
[    2.546419] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[    2.546420] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    2.546421] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    2.546421] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[    2.546422] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[    2.546422] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[    2.546423] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[    2.546423] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[    2.546424] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[    2.546424] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[    2.546425] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[    2.546425] amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 1
[    2.546426] amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 1
[    2.546426] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 4 on hub 1
[    2.546427] amdgpu 0000:03:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
[    2.548877] amdgpu: legacy kernel without apple_gmux_detect()
[    2.548990] amdgpu 0000:03:00.0: amdgpu: Using BACO for runtime pm
[    2.549204] [drm] Initialized amdgpu 3.53.0 20150101 for 0000:03:00.0 on minor 0
[    2.557099] fbcon: amdgpudrmfb (fb0) is primary device
[    2.557101] amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[    4.994714] snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])

Running :

docker run -ti --net=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G -e HSA_OVERRIDE_GFX_VERSION=11.0.0 --name rocm5.5-automatic ghcr.io/evshiron/rocm_lab:rocm5.5-automatic

I get

docker: unknown server OS: .
See 'docker run --help'.

But I dont know if I have Docker daemon or Docker desktop ... I did know there were two Dockers at all....

Edit:

I run the last command adding sudo (Docker commands work only as sudoer for me...)

sudo docker run -ti --net=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G -e HSA_OVERRIDE_GFX_VERSION=11.0.0 --name rocm5.5-automatic ghcr.io/evshiron/rocm_lab:rocm5.5-automatic
docker: Error response from daemon: Conflict. The container name "/rocm5.5-automatic" is already in use by container "1440832e215a029b3886b26c055055a220ada45e2c9cd5c79048f4844ea89d5c". You have to remove (or rename) that container to be able to reuse that name.
See 'docker run --help'.

Thus I did

sudo docker remove 1440832e215a029b3886b26c055055a220ada45e2c9cd5c79048f4844ea89d5c

then

sudo docker run -ti --net=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G -e HSA_OVERRIDE_GFX_VERSION=11.0.0 --name rocm5.5-automatic ghcr.io/evshiron/rocm_lab:rocm5.5-automatic

works, even if it produces very poor images

BTW:
now launching

python3 launch.py --listen --enable-insecure-extension-access --opt-sdp-attention --skip-torch-cuda-test

returns (I'll write only the last line of the output)

OSError: libtorch_hip.so: cannot open shared object file: No such file or directory

Sorry for writing a book but outputs are very long...

@evshiron
Copy link

evshiron commented Jul 17, 2023

@GianlucaMattei

Ok, this is the log from amdgpu you were looking for (I think)

The dmesg logs you posted are normal initialization logs when the system booted, and it looks OK.

works, even if it produces very poor images

So it generates. For image quality, you can choose various models and settings once it's settled.

It should be an installation issue. Let's start from scratch:

# set up groups if it hasn't been done previously, reboot is needed
sudo usermod -aG video $USER
sudo usermod -aG render $USER

mkdir test
cd test

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui

# set up venv
python3 -m venv venv
source venv/bin/activate

# install dependencies
pip3 install -r requirements.txt
# replace torch with the rocm one
pip3 uninstall torch torchvision
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.5

# download a launch script from gist, you can create it yourself if you are concerned
curl https://gist.githubusercontent.com/evshiron/8cf4de34aa01e217ce178b8ed54a2c43/raw/e5743505afe6b2a329908bbefda93d98b98940ac/launch.sh > launch.sh

# launch the webui
bash launch.sh
launch.log
(venv) user@hostname:~/test/stable-diffusion-webui$ bash launch.sh 
Python 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
Version: v1.4.1
Commit hash: f865d3e11647dfd6c7b2cdf90dde24680e58acd8
Installing clip
Installing open_clip
Cloning Stable Diffusion into /home/user/test/stable-diffusion-webui/repositories/stable-diffusion-stability-ai...
Cloning K-diffusion into /home/user/test/stable-diffusion-webui/repositories/k-diffusion...
Cloning CodeFormer into /home/user/test/stable-diffusion-webui/repositories/CodeFormer...
Cloning BLIP into /home/user/test/stable-diffusion-webui/repositories/BLIP...
Installing requirements for CodeFormer
Installing requirements
Launching Web UI with arguments: --listen --enable-insecure-extension-access --opt-sdp-attention
No module 'xformers'. Proceeding without it.
Downloading: "https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors" to /home/user/test/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors

100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 3.97G/3.97G [06:22<00:00, 11.2MB/s]
Calculating sha256 for /home/user/test/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors: preload_extensions_git_metadata for 7 extensions took 0.00s
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 385.1s (import torch: 0.7s, import gradio: 0.5s, import ldm: 0.2s, other imports: 0.3s, list SD models: 382.8s, load scripts: 0.2s, create ui: 0.2s).
6ce0161689b3853acaa03779ec93eafe75a02f4ced659bee03f50797806fa2fa
Loading weights [6ce0161689] from /home/user/test/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors
Creating model from config: /home/user/test/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying attention optimization: sdp... done.
Textual inversion embeddings loaded(0): 
Model loaded in 3.6s (calculate hash: 2.4s, load weights from disk: 0.1s, create model: 0.2s, apply weights to model: 0.2s, apply half(): 0.2s, move model to device: 0.2s, calculate empty prompt: 0.1s).

Now open http://127.0.0.1:7860, input river in the "Prompt" box, click the "Generate" button and it should generate an image of "river".

If it doesn't work, try the above steps with a new user, or with a fresh Ubuntu installation. Post logs if issues remain.

Extra info can be found here.

@Sega999
Copy link

Sega999 commented Jul 27, 2023

I cant get it to build I keep getting a python error

`(vvv) system-name:~/dockstab$ sudo docker-compose up --build
Building sd
Sending build context to Docker daemon 18.7GB
Step 1/38 : FROM rocm/composable_kernel:ck_ub20.04_rocm5.5_rc4
---> 3593424bfc2d
Step 2/38 : RUN mkdir /SD
---> Using cache
---> 8ed3bb34be5b
Step 3/38 : WORKDIR /SD
---> Using cache
---> 0e8b06d6e215
Step 4/38 : RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
---> Using cache
---> 0157c649186d
Step 5/38 : WORKDIR /SD/stable-diffusion-webui
---> Using cache
---> af2b8fdbdcf9
Step 6/38 : RUN git reset --hard 22bcc7be428c94e9408f589966c2040187245d81
---> Using cache
---> 417df67433a2
Step 7/38 : RUN apt update && apt install python3.8-venv
---> Running in d0dc73637440

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Err:2 http://compute-artifactory.amd.com/artifactory/list/rocm-release-archive-20.04-deb 5.5 InRelease
Something wicked happened resolving 'compute-artifactory.amd.com:http' (-5 - No address associated with hostname)
Err:3 http://artifactory-cdn.amd.com/artifactory/list/amdgpu-deb-remote focal/builds InRelease
Something wicked happened resolving 'artifactory-cdn.amd.com:http' (-5 - No address associated with hostname)
Get:4 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
Get:1 http://mirrors.edge.kernel.org/ubuntu focal InRelease [265 kB]
Get:6 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [2597 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:8 http://mirrors.edge.kernel.org/ubuntu focal/universe amd64 Packages [11.3 MB]
Get:9 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
Get:13 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2909 kB]
Get:14 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [1083 kB]
Get:15 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [29.3 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
Get:17 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [32.0 kB]
Get:18 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [2738 kB]
Get:19 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [3390 kB]
Get:20 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1383 kB]
Get:21 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [28.6 kB]
Get:22 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [55.2 kB]
Get:23 http://mirrors.edge.kernel.org/ubuntu focal/main amd64 Packages [1275 kB]
Fetched 40.6 MB in 48s (837 kB/s)
Reading package lists...
Building dependency tree...
Reading state information...
W: Failed to fetch http://artifactory-cdn.amd.com/artifactory/list/amdgpu-deb-remote/dists/focal/builds/InRelease Something wicked happened resolving 'artifactory-cdn.amd.com:http' (-5 - No address associated with hostname)
W: Failed to fetch http://compute-artifactory.amd.com/artifactory/list/rocm-release-archive-20.04-deb/dists/5.5/InRelease Something wicked happened resolving 'compute-artifactory.amd.com:http' (-5 - No address associated with hostname)
W: Some index files failed to download. They have been ignored, or old ones used instead.
87 packages can be upgraded. Run 'apt list --upgradable' to see them.

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
libpython3.8 libpython3.8-dev libpython3.8-minimal libpython3.8-stdlib
python3.8 python3.8-dev python3.8-minimal
Suggested packages:
python3.8-doc
The following NEW packages will be installed:
python3.8-venv
The following packages will be upgraded:
libpython3.8 libpython3.8-dev libpython3.8-minimal libpython3.8-stdlib
python3.8 python3.8-dev python3.8-minimal
7 upgraded, 1 newly installed, 0 to remove and 80 not upgraded.
Need to get 10.8 MB of archives.
After this operation, 26.6 kB of additional disk space will be used.
Do you want to continue? [Y/n] Abort.
The command '/bin/sh -c apt update && apt install python3.8-venv' returned a non-zero code: 1
ERROR: Service 'sd' failed to build : Build failed
`

@evshiron
Copy link

evshiron commented Jul 27, 2023

Change RUN apt update && apt install python3.8-venv to RUN apt update && apt install -y python3.8-venv.

Btw, the tutorial is outdated as of now. If you aren't a big fan of Docker containers, an easy guide can be found here:

Which is a compilation of comments above.

@Sega999
Copy link

Sega999 commented Jul 27, 2023

thank you

@f1am3d
Copy link

f1am3d commented Aug 28, 2023

@evshiron thanks! This is working. 👍

@f1am3d
Copy link

f1am3d commented Aug 28, 2023

Failed on step #21.

@f1am3d
Copy link

f1am3d commented Aug 28, 2023

I cant get it to build I keep getting a python error

The command '/bin/sh -c apt update && apt install python3.8-venv' returned a non-zero code: 1 ERROR: Service 'sd' failed to build : Build failed

Shame here. Hey author, did you test it at all? @In-line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment