Skip to content

Instantly share code, notes, and snippets.

@harishanand95
Last active June 6, 2024 08:42
Show Gist options
  • Save harishanand95/75f4515e6187a6aa3261af6ac6f61269 to your computer and use it in GitHub Desktop.
Save harishanand95/75f4515e6187a6aa3261af6ac6f61269 to your computer and use it in GitHub Desktop.
Stable Diffusion on AMD GPUs on Windows using DirectML

Stable Diffusion for AMD GPUs on Windows using DirectML

UPDATE: A faster (20x) approach for running Stable Diffusion using MLIR/Vulkan/IREE is available on Windows:

https://github.com/nod-ai/SHARK/blob/main/shark/examples/shark_inference/stable_diffusion/stable_diffusion_amd.md

Install 🤗 diffusers

conda create --name sd39 python=3.9 -y
conda activate sd39
pip install diffusers==0.3.0
pip install transformers
pip install onnxruntime
pip install onnx

Install DirectML latest release

You can download the nightly onnxruntime-directml release from the link below

Run python --version to find out, which whl file to download.

  • If you are on Python3.7, download the file that ends with **-cp37-cp37m-win_amd64.whl.
  • If you are on Python3.8, download the file that ends with **-cp38-cp38m-win_amd64.whl
  • and likewise
pip install ort_nightly_directml-1.13.0.dev20220908001-cp39-cp39-win_amd64.whl --force-reinstall

Convert Stable Diffusion model to ONNX format

This apporach is faster than downloading the onnx models files.

wget https://raw.githubusercontent.com/huggingface/diffusers/main/scripts/convert_stable_diffusion_checkpoint_to_onnx.py
  • Run huggingface-cli.exe login and provide huggingface access token.
  • Convert the model using the command below. Models are stored in stable_diffusion_onnx folder.
python convert_stable_diffusion_checkpoint_to_onnx.py --model_path="CompVis/stable-diffusion-v1-4" --output_path="./stable_diffusion_onnx"

Run Stable Diffusion on AMD GPUs

Here is an example python code for stable diffusion pipeline using huggingface diffusers.

from diffusers import StableDiffusionOnnxPipeline
pipe = StableDiffusionOnnxPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt).images[0] 
image.save("astronaut_rides_horse.png")
@nomanHasan
Copy link

Thank you @claforte @harishanand95 for your efforts at making Stable Diffusion more accessible. I run an RX 580, GFX803 which seems to have lost AMD ROCM support long ago. Still, the internet is full of workarounds that do not work in my experience. Looking forward to your guy's hard work to get us to use the open-source API method.

@cpietsch
Copy link

The main issue here is the windows route. If you use linux you can even use the goto stable diffusion UI: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs
Still, I would love to see windows support through the Vulkan API.
If I understand it correctly we need to convert the SD model to SPIR-V using iree-compiler?
There is an example using SHARK: https://github.com/nod-ai/SHARK/blob/b448770ec26d8b8b0cf332f752915ac39b02d935/shark/examples/shark_inference/stable_diff.py

@nomanHasan
Copy link

@cpietsch It doesn't work for Linux very well. The Linux-exclusive ROCM only properly support their workstation GPUs and support for consumer GPUs is lagging. You'd have to follow weird workarounds to get them working on the recent cards. And for slightly older cards like GFX803, it turns out to be impossible.

@cpietsch
Copy link

Oh sorry about that. It worked out of the box for my Radeon VII and I thought that that this was the same for the rest.

@harishanand95
Copy link
Author

Hello everyone. As Christian mentioned, we have added a new pipeline for AMD GPUs using MLIR/IREE. This approach significantly boosts the performance of running Stable Diffusion in Windows and avoids the current ONNX/DirectML approach.

Instructions: https://github.com/nod-ai/SHARK/blob/main/shark/examples/shark_inference/stable_diffusion/stable_diffusion_amd.md

Please reach out to us on the discord link on the instructions page or create GitHub issues if something does not work for you.

Thanks!

@averad, Could you please give it a try and update your instructions too? You can reach us on the discord channel if you have any questions, Thanks!

@averad
Copy link

averad commented Dec 1, 2022

@harishanand95 I will give it a try and update the Instructions.

@averad
Copy link

averad commented Dec 2, 2022

@harishanand95 I wasn't able to test the process as IREE doesn't have support for RX 500 series cards - GCNv3

I've suggested adding def VK_TTA_RGCNv3 : I32EnumAttrCase<"AMD_RGCNv3", 103, "rgcn3">; and am working on compiling IREE with my suggested changes for testing.

@cpietsch
Copy link

cpietsch commented Dec 4, 2022

I am getting 3.85 it/s on my 6900xt on SHARK (vulkan), that is 13 seconds for 50 iterations

@phreeware
Copy link

hi, the exe doesnt work for me following your little guide (using the MLIR driver on 6900XT), im getting errors:
image

ill try the manual guide

@cpietsch
Copy link

cpietsch commented Dec 4, 2022

For me the Advanced Installation worked

@Dwakener
Copy link

Dwakener commented Mar 2, 2023

Time generation ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment