averad/Stable_Diffusion.md

Forked from harishanand95/Stable_Diffusion.md

Last active July 10, 2025 21:20

Star (67) You must be signed in to star a gist
Fork (13) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/averad/256c507baa3dcc9464203dc14610d674.js"></script>
Save averad/256c507baa3dcc9464203dc14610d674 to your computer and use it in GitHub Desktop.

Download ZIP

Stable Diffusion on AMD GPUs on Windows using DirectML

Raw

Stable_Diffusion.md

🤗 Stable Diffusion for AMD GPUs on Windows using DirectML

Requirements

Python 3.10 or earlier installed (https://www.python.org/downloads/)
Git installed (https://gitforwindows.org/)

Installation

Create a Folder to Store Stable Diffusion Related Files

Open File Explorer and navigate to your prefered storage location.
Create a new folder named "Stable Diffusion" and open it.
In the navigation bar, in file explorer, highlight the folder path and type cmd and press enter.

Install 🤗 diffusers

The following steps creates a virtual environment (using venv) named sd_env (in the folder you have the cmd window opened to). Then it installs diffusers (latest from main branch), transformers, onnxruntime, onnx, onnxruntime-directml and protobuf:

pip install virtualenv
python -m venv sd_env
.\sd_env\Scripts\activate
python -m pip install --upgrade pip
pip install git+https://github.com/huggingface/diffusers.git
pip install git+https://github.com/huggingface/transformers.git
pip install onnxruntime onnx torch ftfy spacy scipy
pip install onnxruntime-directml --force-reinstall
pip install protobuf==3.20.1

To exit the virtual environment, close the command prompt. To start the virtual environment go to the scripts folder in sd_env and open a command prompt. Type activate and the virtual environment will activate.

Download the Stable Diffusion ONNX model

You will need to go to: https://huggingface.co/runwayml/stable-diffusion-v1-5 and https://huggingface.co/runwayml/stable-diffusion-inpainting. Review and accept the usage/download agreements before completing the following steps.

stable-diffusion-v1-5 uses 5.10 GB
stable-diffusion-inpainting uses 5.10 GB

If your model folders are larger, open stable_diffusion_onnx and stable_diffusion_onnx_inpainting and delete the .git folders

git clone https://huggingface.co/runwayml/stable-diffusion-v1-5 --branch onnx --single-branch stable_diffusion_onnx
git clone https://huggingface.co/runwayml/stable-diffusion-inpainting --branch onnx --single-branch stable_diffusion_onnx_inpainting

Enter in your HuggingFace credentials and the download will start. Once complete, you are ready to start using Stable Diffusion

Scripts / Examples

Copy one of the examples below and save it as a .py file. Then you type "python name_of_the_file.py" in a cmd window.

Stable Diffusion Txt 2 Img on AMD GPUs

Here is an example python code for the Onnx Stable Diffusion Pipeline using huggingface diffusers.

from diffusers import OnnxStableDiffusionPipeline
height=512
width=512
num_inference_steps=50
guidance_scale=7.5
prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt="bad hands, blurry"
pipe = OnnxStableDiffusionPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider", safety_checker=None)
image = pipe(prompt, height, width, num_inference_steps, guidance_scale, negative_prompt).images[0] 
image.save("astronaut_rides_horse.png")

Stable Diffusion Img 2 Img on AMD GPUs

Here is an example python code for Onnx Stable Diffusion Img2Img Pipeline using huggingface diffusers.

import time
import torch
from PIL import Image
from diffusers import OnnxStableDiffusionImg2ImgPipeline

init_image = Image.open("test.png")
prompt = "A fantasy landscape, trending on artstation"

pipe = OnnxStableDiffusionImg2ImgPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider", revision="onnx", safety_checker=None)
image = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5).images[0] 
image.save("test-output.png")

Stable Diffusion Inpainting on AMD GPUs

Here is an example python code for the Onnx Stable Diffusion Inpaint Pipeline using huggingface diffusers.

import torch
from PIL import Image
from diffusers import OnnxStableDiffusionInpaintPipeline

pipe = OnnxStableDiffusionInpaintPipeline.from_pretrained("./stable_diffusion_onnx_inpainting", provider="DmlExecutionProvider", revision="onnx", safety_checker=None)

init_image = Image.open("test.png")
init_image = init_image.resize((512, 512))
mask_image = Image.open("mask.png")
mask_image = mask_image.resize((512, 512))
prompt = "Face of a yellow cat, high resolution"

image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, strength=0.75, guidance_scale=7.5).images[0] 
image.save("test-output.png")

Inpaint Images need to be width 512 height 512

You can make an image mask using photopea

Example Txt2Img Script With More Features

User is prompted in console for Image Parameters
Date/Time, Image Parameters & Completion Time is logged in a Txt File "prompts.txt"
Image is saved, named date-time.png (date-time = time image generation was started)
User is asked for another prompt or q to quit.

import os
import gc
import sys
import time
import traceback
import numpy as np
from diffusers import OnnxStableDiffusionPipeline
from diffusers import (
    DDPMScheduler,
    DDIMScheduler,
    PNDMScheduler,
    LMSDiscreteScheduler,
    EulerDiscreteScheduler,
    EulerAncestralDiscreteScheduler,
    DPMSolverMultistepScheduler,
)

output_folder = "complete"
log_folder = "complete"
models_folder = "model"

def choose_model():
    model=None
    while model == None:
        os.system('cls')
        print('Stable Diffusion Onnx DirectML\nText to Img\n')
        model_root = os.path.realpath(os.path.dirname(__file__))+"\\"+models_folder+"\\"
        model_list = [ item for item in os.listdir(model_root) if os.path.isdir(os.path.join(model_root, item)) ]
        if len(model_list) <= 0 or model_list == None:
            call_quit(
                "No models found.\nPlease place your model folders in "+str(model_root)+" or update the\
'models_folder' variable in this script")
        model_choices = "Avalible Models\n"
        x = 1
        for i in model_list:
            model_choices += str(x) + " (" + str(i) + ")\n"
            x += 1
        model_choices += "Please Choose a Model#: (or q to quit): "
        user_input_model = input(model_choices)
        if user_input_model == "q":
            call_quit("Quit Called, Script Ended")
        if user_input_model.isnumeric():
            if int(user_input_model) >= 0 and int(user_input_model) <= len(model_list):
                model = str(model_root)+str(model_list[int(user_input_model)-1])
        else:
            model = None
    return model

def choose_scheduler(model):
    sched=None
    scheduler_list = [
        [1,DDPMScheduler.from_pretrained(model, subfolder="scheduler"),"DDPMScheduler"],
        [2,DDIMScheduler.from_pretrained(model, subfolder="scheduler"),"DDIMScheduler"],
        [3,PNDMScheduler.from_pretrained(model, subfolder="scheduler"),"PNDMScheduler"],
        [4,LMSDiscreteScheduler.from_pretrained(model, subfolder="scheduler"),"LMSDiscreteScheduler"],
        [5,EulerAncestralDiscreteScheduler.from_pretrained(model, subfolder="scheduler"),
        "EulerAncestralDiscreteScheduler"],
        [6,EulerDiscreteScheduler.from_pretrained(model, subfolder="scheduler"),"EulerDiscreteScheduler"],
        [7,DPMSolverMultistepScheduler.from_pretrained(model, subfolder="scheduler"),
        "DPMSolverMultistepScheduler"],
    ]
    os.system('cls')
    scheduler_choices = "Avalible Schedulers\n"
    for i in scheduler_list:
        scheduler_choices += str(i[0]) + " (" + str(i[2]) + ")\n"
    scheduler_choices += "Please Choose a Scheduler#: (or q to quit): "
    while sched == None:
        os.system('cls')
        print('Stable Diffusion Onnx DirectML\nText to Img\n')
        user_input_sched = input(scheduler_choices)
        if user_input_sched == "q":
            call_quit("Quit Called, Script Ended")
        for i in scheduler_list:
            if user_input_sched == str(i[0]):
                sched = i[1]
                sched_txt = str(i[2])
    return sched_txt, sched;

def loadPipe(model=None, sched=None, sched_txt=None, provider="DmlExecutionProvider"):
    pipe = None
    if model == None:
        model = choose_model()
    if sched_txt == None and sched == None:
        sched_txt, sched = choose_scheduler(model)
    os.system('cls')
    pipe = OnnxStableDiffusionPipeline.from_pretrained(
        model,
        revision="onnx",
        provider=provider, 
        safety_checker=None,
        scheduler=sched,
    )
    return model, pipe, sched_txt, sched;

def txt_to_img(prompt, negative_prompt, num_inference_steps, guidance_scale, width, height, seed):
    gen_time = time.strftime("%m%d%Y-%H%M%S")
    rng = np.random.RandomState(seed)
    start_time = time.time()
    image = pipe(
        prompt,
        height,
        width,
        num_inference_steps,
        guidance_scale,
        negative_prompt,
        generator = rng,
        ).images[0]
    image.save("./complete/" + gen_time + ".png")
    del image
    del rng
    gc.collect()
    log_info = "\n" + gen_time + " - Seed: " + str(seed) + " - Gen Time: "+ str(time.time() - start_time) + "s"
    with open('./'+log_folder+'/prompts.txt', 'a+', encoding="utf-16") as f:
        f.write(log_info)

def check_folders(output_folder, log_folder, models_folder):
    output_folder_check = os.path.isdir(output_folder)
    if not output_folder_check: 
        os.makedirs(output_folder)
    log_folder_check = os.path.isdir(log_folder)
    if not log_folder_check:
        os.makedirs(log_folder)
    models_folder_check = os.path.isdir(models_folder)
    if not models_folder_check:
        os.makedirs(models_folder)

def call_quit(msg):
    try:
        del pipe
    except:
        None
    gc.collect()
    os.system('cls')
    sys.exit(str(msg))

check_folders(output_folder, log_folder, models_folder)
user_input = None
prev_height = None
prev_width = None
reload = False
error = ["", False]
model, pipe, sched_txt, sched = loadPipe()
while user_input == None:
    os.system('cls')
    print('Stable Diffusion Onnx DirectML (' + model + ' - ' + sched_txt + ')\nText to Img\n')
    prompt=None
    while prompt == "" or prompt == None:
        prompt = input('Please Enter Prompt (or q to quit): ')
    if prompt != "q":
        negative_prompt = input('Please Enter Negative Prompt (Optional): ')
        variations=None
        while variations == None:
            variations = input('How Many Images? (Optional): ')
            if variations.isnumeric() == False:
                variations = None
            if variations == 0 or variations == "" or variations == None :
                variations = "1"
        num_inference_steps = input('Please Enter # of Inference Steps (Optional): ')
        if num_inference_steps.isnumeric() == False:
            num_inference_steps = 50
        guidance_scale =  input('Please Enter Guidance Scale (Optional): ')
        if guidance_scale.isnumeric() == False:
            guidance_scale = 7.5
        width = input('Please Enter Width 512 576 640 704 768 832 896 960 (Optional): ')
        if width.isnumeric() == False:
            width = 512
        if prev_width != None:
            if prev_width != width:
                prev_width = width
                reload = True
        else:
            prev_width = width
        height = input('Please Enter Height 512 576 640 704 768 832 896 960 (Optional): ')
        if height.isnumeric() == False:
            height = 512
        if prev_height != None:
            if prev_height != height:
                prev_height = height
                reload = True
        else:
            prev_height = height
        seed = input('Please Enter Seed (Optional): ')
        if seed.isnumeric() == False:
            seed = None
        gen_time = time.strftime("%m%d%Y-%H%M%S")
        log_info = "\n" + gen_time + " - Model: " + model + " Scheduler: " + sched_txt
        log_info += "\n" + gen_time + " - Prompt: " + prompt
        log_info += "\n" + gen_time + " - Neg_Prompt: " + negative_prompt
        log_info += "\n" + gen_time + " - Inference Steps: " + str(num_inference_steps) + " Guidance Scale: " \
+ str(guidance_scale) + " Width: " + str(width) + " Height: " + str(height)
        with open('./'+log_folder+'/prompts.txt', 'a+', encoding="utf-16") as f:
            f.write(log_info)
        if seed == "" or seed == None:
            rng = np.random.default_rng()
            seed = rng.integers(np.iinfo(np.uint32).max)
        else:
            try:
                seed = int(seed) & np.iinfo(np.uint32).max
            except ValueError:
                seed = hash(seed) & np.iinfo(np.uint32).max
        seeds = np.array([seed], dtype=np.uint32)
        if int(variations) > 1:
            seed_seq = np.random.SeedSequence(seed)
            seeds = np.concatenate((seeds, seed_seq.generate_state(int(variations) - 1)))
        if reload == True:
            del pipe
            gc.collect()
            model, pipe, sched_txt, sched = loadPipe(model, sched, sched_txt)
            reload == False
        os.system('cls')
        print('Stable Diffusion Onnx DirectML (' + model + ' - ' + sched_txt + ')\nText to Img\n')
        for i in range(int(variations)):
            print(str(i+1) + "/" + str(variations))
            try:
                txt_to_img(str(prompt), str(negative_prompt), int(num_inference_steps), int(guidance_scale), int(width), int(height), int(seeds[i]))
            except KeyboardInterrupt:
                gen_time = time.strftime("%m%d%Y-%H%M%S")
                log_info = "\n" + gen_time + " - Error: Keyboard Interrupt"
                log_info += "\n--------------------------------------------------"
                with open('./'+log_folder+'/prompts.txt', 'a+', encoding="utf-16") as f:
                    f.write(log_info)
                call_quit("CTRL+C Pressed, Script Ended")
            except Exception as e:
                error = [str(e),True]
                gen_time = time.strftime("%m%d%Y-%H%M%S")
                log_info = "\n" + gen_time + " - Error: " + str(e)
                log_info += "\n" + gen_time + " - " + traceback.format_exc()
                with open('./'+log_folder+'/prompts.txt', 'a+', encoding="utf-16") as f:
                    f.write(log_info)
                break
        log_info = "\n--------------------------------------------------"
        with open('./'+log_folder+'/prompts.txt', 'a+', encoding="utf-16") as f:
            f.write(log_info)
        prompt = None
        variations = None
        os.system('cls')
        print('Stable Diffusion Onnx DirectML (' + model + ' - ' + sched_txt + ')\nText to Img\n')
        if error[1] == True:
            print("Image Generation Failed\nError: " + error[0] + "\nSee './" + log_folder + "/prompts.txt' for more info\n")
            error = ["", False]
        change_model = ""
        while change_model == "":
            change_model = input('Change Model? (y/n) or (q to quit): ')
            if change_model == "y" or change_model == "Y":
                del pipe
                model, pipe, sched_txt, sched = loadPipe()
            elif change_model == "q":
                call_quit("Quit Called, Script Ended")
            elif change_model == "n" or change_model == "N":
                change_sched = ""
                while change_sched == "":
                    change_sched = input('Change Scheduler? (y/n) or (q to quit): ')
                    if change_sched == "y" or change_sched == "Y":
                        sched_txt, sched = choose_scheduler(model)
                        if type(pipe.scheduler) is not type(sched):
                            pipe.scheduler = sched
                    elif change_sched == "q":
                        call_quit("Quit Called, Script Ended")
            else:
                change_model = ""
    else:
        call_quit("Quit Called, Script Ended")

Output

prompts.txt

10232022-233730 - Model: ./stable_diffusion_onnx
10232022-233730 - Prompt: cat
10232022-233730 - Neg_Prompt: dog
10232022-233730 - Inference Steps: 50 Guidance Scale: 7.5 Width: 512 Height: 512
10232022-233730 - Seed: 22220167420300 - Gen Time: 250.15623688697815s

Convert Stable Diffusion model to ONNX format

Some Models are not avalible in Onnx format and will need to be converted.

Install wget for Windows

Download wget for Windows and install the package.
Copy the wget.exe file into your C:\Windows\System32 folder.

Convert Original Stable Diffusion to Diffusers (Ckpt File)

Example File to Convert: Anything-V3.0.ckpt
Download the latest version of the Convert Original Stable Diffusion to Diffusers script
Run python convert_original_stable_diffusion_to_diffusers.py --checkpoint_path="./model.ckpt" --dump_path="./model_diffusers"

Notes:

Change --checkpoint_path="./model.ckpt" to match the ckpt file to convert
Change --dump_path="./model_diffusers" to the output folder location to use
You will need to run Convert Stable Diffusion Checkpoint to Onnx (see below) to use the model

Convert Stable Diffusion Checkpoint to Onnx

Example File to Convert: waifu-diffusion
Download the latest version of the Convert Stable Diffusion Checkpoint to Onnx script
Run python convert_stable_diffusion_checkpoint_to_onnx.py --model_path="./model_diffusers" --output_path="./model_onnx"
Change --model_path="./model_diffusers" and --output_path="./model_onnx"

Additional Tools (Optional)

Upscaling

Real-ESRGAN

https://github.com/xinntao/Real-ESRGAN#portable-executable-files-ncnn

Usage: realesrgan-ncnn-vulkan.exe -i infile -o outfile [options]...

  -h                   show this help
  -i input-path        input image path (jpg/png/webp) or directory
  -o output-path       output image path (jpg/png/webp) or directory
  -s scale             upscale ratio (can be 2, 3, 4. default=4)
  -t tile-size         tile size (>=32/0=auto, default=0) can be 0,0,0 for multi-gpu
  -m model-path        folder path to the pre-trained models. default=models
  -n model-name        model name (default=realesr-animevideov3, can be realesr-animevideov3 | realesrgan-x4plus | realesrgan-x4plus-anime | realesrnet-x4plus)
  -g gpu-id            gpu device to use (default=auto) can be 0,1,2 for multi-gpu
  -j load:proc:save    thread count for load/proc/save (default=1:2:2) can be 1:2,2,2:2 for multi-gpu
  -x                   enable tta mode"
  -f format            output image format (jpg/png/webp, default=ext/png)
  -v                   verbose output

RealSR ncnn Vulkan

https://github.com/nihui/realsr-ncnn-vulkan

Usage: realsr-ncnn-vulkan -i infile -o outfile [options]...

  -h                   show this help
  -v                   verbose output
  -i input-path        input image path (jpg/png/webp) or directory
  -o output-path       output image path (jpg/png/webp) or directory
  -s scale             upscale ratio (4, default=4)
  -t tile-size         tile size (>=32/0=auto, default=0) can be 0,0,0 for multi-gpu
  -m model-path        realsr model path (default=models-DF2K_JPEG)
  -g gpu-id            gpu device to use (-1=cpu, default=0) can be 0,1,2 for multi-gpu
  -j load:proc:save    thread count for load/proc/save (default=1:2:2) can be 1:2,2,2:2 for multi-gpu
  -x                   enable tta mode
  -f format            output image format (jpg/png/webp, default=ext/png)

SRMD ncnn Vulkan

https://github.com/nihui/srmd-ncnn-vulkan

Usage: srmd-ncnn-vulkan -i infile -o outfile [options]...

  -h                   show this help
  -v                   verbose output
  -i input-path        input image path (jpg/png/webp) or directory
  -o output-path       output image path (jpg/png/webp) or directory
  -n noise-level       denoise level (-1/0/1/2/3/4/5/6/7/8/9/10, default=3)
  -s scale             upscale ratio (2/3/4, default=2)
  -t tile-size         tile size (>=32/0=auto, default=0) can be 0,0,0 for multi-gpu
  -m model-path        srmd model path (default=models-srmd)
  -g gpu-id            gpu device to use (default=0) can be 0,1,2 for multi-gpu
  -j load:proc:save    thread count for load/proc/save (default=1:2:2) can be 1:2,2,2:2 for multi-gpu
  -x                   enable tta mode
  -f format            output image format (jpg/png/webp, default=ext/png)

Image Editing

ImageMagick

Use ImageMagick® to create, edit, compose, or convert digital images. It can read and write images in a variety of formats (over 200) including PNG, JPEG, GIF, WebP, HEIC, SVG, PDF, DPX, EXR and TIFF. ImageMagick can resize, flip, mirror, rotate, distort, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.

https://imagemagick.org/script/index.php

Photopea

Photopea is a web-based photo and graphics editor by Ivan Kuckir. It is used for image editing, making illustrations, web design or converting between different image formats. Photopea is advertising-supported software. It is compatible with all modern web browsers, including Opera, Edge, Chrome, and Firefox. The app is compatible with raster and vector graphics, such as Photoshop’s PSD as well as JPEG, PNG, DNG, GIF, SVG, PDF and other image file formats. While browser-based, Photopea stores all files locally, and does not upload any data to a server.

https://www.photopea.com/

FAQs

How can I clear cached models?

huggingface-cli scan-cache --dir ~/.cache/huggingface/diffusers
huggingface-cli delete-cache --dir ~/.cache/huggingface/diffusers

Can I download and install ort-nightly-directml instead of onnxruntime-directml?

Yes and it can provide better image generation times.

You can download the nightly onnxruntime-directml release from the link below

https://aiinfra.visualstudio.com/PublicPackages/_artifacts/feed/ORT-Nightly/PyPI/ort-nightly-directml/versions/

Run python --version to find out, which whl file to download.

Which file should I download?

If you are on Python3.7, download the file that ends with **-cp37-cp37m-win_amd64.whl.
If you are on Python3.8, download the file that ends with **-cp38-cp38m-win_amd64.whl
If you are on Python3.9, download the file that ends with **-cp38-cp38m-win_amd64.whl
etc. etc.

pip install replace_with_the_file_you_downloaded.whl --force-reinstall
pip install protobuf==3.20.1

How do you install or use diffrent models?

Instructions for converting models to the Onnx format are available at: https://gist.github.com/averad/256c507baa3dcc9464203dc14610d674#convert-stable-diffusion-model-to-onnx-format

If the model you want to use is already in the Onnx format, you need to adjust the pipe to call the model you want to use:

Example:

pipe = OnnxStableDiffusionPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider", safety_checker=None)

In the above pipe example, you would change ./stable_diffusion_onnx to match the model folder you want to use.

If you want to load an Onnx Model directly from the Huggingface website and cache it in your virtual environment (sd_env), adjust the pipe as follows:

Example:

pipe = OnnxStableDiffusionPipeline.from_pretrained("lambdalabs/sd-pokemon-diffusers", revision="onnx", provider="DmlExecutionProvider", safety_checker=None)

Note: Loading models directly from the hugging face website requires running huggingface-cli login and enter the requested token information.

Author

averad commented Nov 28, 2022 •

edited

Loading

👻 IREE - Getting started - Building from Source (Windows)
https://gist.github.com/averad/b0c020eaf9e0a480660b0476954f600a

@claforte @harishanand95 - [Document] Basic workflow for building IREE for Windows

jamiecropley commented Nov 30, 2022

Can i somehow select which GPU (i have 2 AMDs in my system) will be used? Currently only the slow APU (Vega 10) is used ...

EDIT: Nevermind, found the solution myself by looking at the unit-tests ;-)

For anyone searching for a solution, this script selects the 2nd GPU:
from diffusers import OnnxStableDiffusionPipeline
height=512
width=512
num_inference_steps=50
guidance_scale=7.5
prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt="bad hands, blurry"

gpu_provider = ('DmlExecutionProvider', {
	'device_id': 1,
})

pipe = OnnxStableDiffusionPipeline.from_pretrained("./stable_diffusion_onnx", provider=gpu_provider, safety_checker=None)
image = pipe(prompt, height, width, num_inference_steps, guidance_scale, negative_prompt).images[0] 
image.save("astronaut_rides_horse.png")
just change device_id to 0 for the first GPU and to 1 for the second.

Can you do a for loop or something to cycle between two GPU's like this?

JStrbg commented Dec 1, 2022 •

edited

Loading

Im having issue with converting stable-diffusion 2 checkpoint to onnx with this process

Console is spammed with :
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
size mismatch for down_blocks.0.attentions.0.proj_in.weight: copying a param with shape torch.Size([320, 320]) from checkpoint, the shape in current model is torch.Size([320, 320, 1, 1]).

Is there a step I am missing?
It seem to work fine with https://huggingface.co/runwayml/stable-diffusion-v1-5 checkpoint

EDIT:
The latest update of the script (Updated 4 days ago) seem to have fixed this issue for me. After also having updated transformers to ==4.22 I got it completely working :)

Author

averad commented Dec 1, 2022

@JStrbg

Make sure you are using the latest version of the conversion scripts. (Updated 6 Days Ago)

https://github.com/huggingface/diffusers/tree/main/scripts

JStrbg commented Dec 1, 2022

Yes I am using the latest https://github.com/huggingface/diffusers/blob/main/scripts/convert_original_stable_diffusion_to_diffusers.py

crazyfox55 commented Dec 6, 2022

Can i somehow select which GPU (i have 2 AMDs in my system) will be used? Currently only the slow APU (Vega 10) is used ...

EDIT: Nevermind, found the solution myself by looking at the unit-tests ;-)

For anyone searching for a solution, this script selects the 2nd GPU:
from diffusers import OnnxStableDiffusionPipeline
height=512
width=512
num_inference_steps=50
guidance_scale=7.5
prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt="bad hands, blurry"

gpu_provider = ('DmlExecutionProvider', {
	'device_id': 1,
})

pipe = OnnxStableDiffusionPipeline.from_pretrained("./stable_diffusion_onnx", provider=gpu_provider, safety_checker=None)
image = pipe(prompt, height, width, num_inference_steps, guidance_scale, negative_prompt).images[0] 
image.save("astronaut_rides_horse.png")
just change device_id to 0 for the first GPU and to 1 for the second.

Your fix worked great for me 6800M GPU with a 6900HS CPU.

This is critical information for anyone on laptops. I struggled for hours trying to force my dedicated GPU to have priority over the integrated one. I searched for gpu_id, force gpu, choose gpu, torch.device("cuda:1"), pipe.to("cuda:1"), "cuda", "gpu" but nothing was helping me fix the problem. My last ditch hope was to read every comment on this page hoping for another way to use stable diffusion with AMD gpu.

@kadrim double plus good work

AkshaySapra commented Dec 17, 2022

How would I apply different styles to the images, like you see people doing on Youtube with I guess the "normal" installation with a NVidia GPU?

Shanesan commented Dec 17, 2022

Has the OnnxStableDiffusionPipeline changed? I cannot add width and height to my pipe without getting an error. I have the following variables and the following pipe:

tall = 600
wide = 600
inference_steps = 10
guidance_multiplier = 10
image = pipe(prompt=prompt_text, guidance_scale=guidance_multiplier, num_inference_steps=inference_steps).images[0]

The above runs fine and gives the default 512x512 image at the multipliers and inference steps I want.

If I try adding width and height like so:
image = pipe(prompt=prompt_text, guidance_scale=guidance_multiplier, num_inference_steps=inference_steps, width=wide, height=tall).images[0]

I get the following error:

Traceback (most recent call last):
  File ".\test.py", line 61, in <module>
    image = pipe(prompt=prompt_text, guidance_scale=guidance_multiplier, num_inference_steps=inference_steps, width=wide, height=tall).images[0]
  File ".\virtualenv\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_onnx_stable_diffusion.py", line 274, in __call__
    noise_pred = self.unet(sample=latent_model_input, timestep=timestep, encoder_hidden_states=text_embeddings)
  File ".\virtualenv\lib\site-packages\diffusers\onnx_utils.py", line 61, in __call__
    return self.model.run(None, inputs)
  File ".\virtualenv\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 200, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Concat node. Name:'/up_blocks.1/Concat' Status Message: D:\a\_work\1\s\onnxruntime\core\providers\dml\DmlExecutionProvider\src\MLOperatorAuthorImpl.cpp(1878)\onnxruntime_pybind11_state.pyd!00007FFB146DF72D: (caller: 00007FFB146E0AEF) Exception(3) tid(49b0) 80070057 The parameter is incorrect.

Amblyopius commented Dec 17, 2022

It might be tripping over the 600x600 which is not really an ideal format. Does it also fault if you use multiples of 64 (which should generally give the best results). Note that generally you'll get a lot more artefacts if neither width or height is 512 when the model is trained for 512x512. So for example 704x512 and 512x704 generally still work with at times still some weird duplication.

Shanesan commented Dec 17, 2022

@Amblyopius thanks for the advice. Before you posted I was trying a --force-reinstall of the latest nightly of ort-nightly-directml, then I tried it at 64x64 just to see and that worked, and now I can push it up to 704x704 (but boy does that slow it down). I can't guarantee that the previous ort-nightly-directml was bugged or something, but this seems to work now! Thank you!

Amblyopius commented Dec 17, 2022

If speed is an issue, shameless plug for my instruction on how to do FP16: https://github.com/Amblyopius/AMD-Stable-Diffusion-ONNX-FP16

catwiesel commented Dec 19, 2022 •

edited

Loading

this is a great post and might be the best to get it to work with amd...

ive been playing around with it for a few days now.

i have a few questions, comments....

I am using the (modified) txt2img script. I have multiple models to try from and tested. the 1st scheduler (DDPMScheduler) seems to be completely useless, even when going up to >200 iterations, and using very low, or medium, or very high guidance scales, its mostly garbage. I mean, at around 250 iterations I sometimes can kinda see where it will be going...
Is the DDPMScheduler just needing 10,100x more iterations than the others? is some part of the script or my understanding flawed? I did not really find any ressources online that seem to indicate that what I observe is normal behaviour, but, seeing how the same script works for everything else, and seeing how it behaves the same with different models, different seeds, different prompts, while all the same settings work find with any other scheduler, I am really uncertain what is going on here...
what are your experiences?
Convert Stable Diffusion model to ONNX format
the way it is written now its hard to understand that you have to run both scripts to go from a ckpt file to working onnx model
it would be really nice to have some more insight/script examples to see how it works with inpainting, outpainting,

m8ax commented Jan 15, 2023

Is there a way to convert ckpt to onnx? I have tried several ways and there is no way if someone knows tell me please

kadrim commented Jan 15, 2023

there a way to convert ckpt to onnx? I have tried several ways and there is no way if someone knows tell me please

RTFM https://gist.github.com/averad/256c507baa3dcc9464203dc14610d674#convert-stable-diffusion-model-to-onnx-format

catwiesel commented Jan 15, 2023 via email

yeah, its a two step process which is described in the original text, but was not really well explained, as in that is is a two step process (which is my second point in my comment that you replied to) - Convert Original Stable Diffusion to Diffusers (Ckpt File) - Convert Stable Diffusion Checkpoint to Onnx you need to do/follow both to get from ckpt to onnx, and pay attention to the script/command run, they are very similar but not exactly so... Am 15.01.2023 um 18:40 schrieb --- MvIiIaX ---:

…

Re: averad/Stable_Diffusion.md ***@***.**** commented on this gist. ------------------------------------------------------------------------ Is there a way to convert ckpt to onnx? I have tried several ways and there is no way if someone knows tell me please — Reply to this email directly, view it on GitHub <https://gist.github.com/256c507baa3dcc9464203dc14610d674#gistcomment-4437769> or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACA5ZFZ6DRRCQQO7GH7PLYTWSQY7DBFKMF2HI4TJMJ2XIZLTSKBKK5TBNR2WLJDHNFZXJJDOMFWWLK3UNBZGKYLEL52HS4DFQKSXMYLMOVS2I5DSOVS2I3TBNVS3W5DIOJSWCZC7OBQXE5DJMNUXAYLOORPWCY3UNF3GS5DZVRZXKYTKMVRXIX3UPFYGLK2HNFZXIQ3PNVWWK3TUUZ2G64DJMNZZDAVEOR4XAZNEM5UXG5FFOZQWY5LFVEYTCOBZGY3DQMJVU52HE2LHM5SXFJTDOJSWC5DF>. You are receiving this email because you commented on the thread. Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

KrakaD commented Jan 24, 2023

Is there an in browser version of this that is ran locally?

m8ax commented Jan 30, 2023

yeah, its a two step process which is described in the original text, but was not really well explained, as in that is is a two step process (which is my second point in my comment that you replied to) - Convert Original Stable Diffusion to Diffusers (Ckpt File) - Convert Stable Diffusion Checkpoint to Onnx you need to do/follow both to get from ckpt to onnx, and pay attention to the script/command run, they are very similar but not exactly so... Am 15.01.2023 um 18:40 schrieb --- MvIiIaX ---:
…
Re: averad/Stable_Diffusion.md @.**** commented on this gist. ------------------------------------------------------------------------ Is there a way to convert ckpt to onnx? I have tried several ways and there is no way if someone knows tell me please — Reply to this email directly, view it on GitHub https://gist.github.com/256c507baa3dcc9464203dc14610d674#gistcomment-4437769 or unsubscribe https://github.com/notifications/unsubscribe-auth/ACA5ZFZ6DRRCQQO7GH7PLYTWSQY7DBFKMF2HI4TJMJ2XIZLTSKBKK5TBNR2WLJDHNFZXJJDOMFWWLK3UNBZGKYLEL52HS4DFQKSXMYLMOVS2I5DSOVS2I3TBNVS3W5DIOJSWCZC7OBQXE5DJMNUXAYLOORPWCY3UNF3GS5DZVRZXKYTKMVRXIX3UPFYGLK2HNFZXIQ3PNVWWK3TUUZ2G64DJMNZZDAVEOR4XAZNEM5UXG5FFOZQWY5LFVEYTCOBZGY3DQMJVU52HE2LHM5SXFJTDOJSWC5DF. You are receiving this email because you commented on the thread. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

nothing ... i cannot convert ckpt model with my face to onnx if anybody wants to convert the ckpt by me... my telegram is @mviiiax

KangbingZhao commented Feb 20, 2023 •

edited

Loading

does anyone have the same error below?

(sdonnx) C:\Data\Code\AI\sd-onnx\stable-difussion>python scripts/t2i.py
2023-02-20 11:12:30.3635554 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-02-20 11:12:31.2427573 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-02-20 11:12:31.2481745 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2023-02-20 11:12:36.1750254 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-02-20 11:12:36.2239146 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-02-20 11:12:36.2288181 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
2023-02-20 11:12:36.8986406 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-02-20 11:12:37.0327712 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-02-20 11:12:37.0380411 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
C:\Users\zkb\miniconda3\envs\sdonnx\lib\site-packages\transformers\models\clip\feature_extraction_clip.py:28: FutureWarning: The class CLIPFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use CLIPImageProcessor instead.
  warnings.warn(
2023-02-20 11:12:37.6270735 [W:onnxruntime:, inference_session.cc:497 onnxruntime::InferenceSession::RegisterExecutionProvider] Having memory pattern enabled is not supported while using the DML Execution Provider. So disabling it for this session since it uses the DML Execution Provider.
2023-02-20 11:12:37.6803258 [W:onnxruntime:, session_state.cc:1136 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-02-20 11:12:37.6853572 [W:onnxruntime:, session_state.cc:1138 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_onnx_stable_diffusion.OnnxStableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
  2%|█▋                                                                                 | 1/51 [00:03<03:07,  3.75s/it]
Traceback (most recent call last):
  File "C:\Data\Code\AI\sd-onnx\stable-difussion\scripts\t2i.py", line 10, in <module>
    image = pipe(prompt, height, width, num_inference_steps, guidance_scale, negative_prompt).images[0]
  File "C:\Users\zkb\miniconda3\envs\sdonnx\lib\site-packages\diffusers\pipelines\stable_diffusion\pipeline_onnx_stable_diffusion.py", line 273, in __call__
    noise_pred = self.unet(sample=latent_model_input, timestep=timestep, encoder_hidden_states=prompt_embeds)
  File "C:\Users\zkb\miniconda3\envs\sdonnx\lib\site-packages\diffusers\pipelines\onnx_utils.py", line 60, in __call__
    return self.model.run(None, inputs)
  File "C:\Users\zkb\miniconda3\envs\sdonnx\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 200, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail

the script failed if I set the width and height to 512, but it's OK if the resolution is 256x256. And If I change the DmlExecutionProvider to CPU the 512x512 works. Very wired because I allocate 16 GB VRAM to my iGPU (I am using 5700G with total 32 GB ram)

my script:

from diffusers import OnnxStableDiffusionPipeline
import traceback
height=512
width=512
num_inference_steps=50
guidance_scale=7.5
prompt = "a photo of an astronaut riding a horse on mars"
negative_prompt="bad hands, blurry"
pipe = OnnxStableDiffusionPipeline.from_pretrained("./stable_diffusion_onnx", provider="DmlExecutionProvider", safety_checker=None)
image = pipe(prompt, height, width, num_inference_steps, guidance_scale, negative_prompt).images[0] 
image.save("astronaut_rides_horse.png")

Amblyopius commented Feb 20, 2023

Probably better to use an up to date guide. Try this: https://github.com/Amblyopius/Stable-Diffusion-ONNX-FP16

wchao1115 commented Jun 6, 2023

onnxruntime-directml v1.15 with DirectML v1.12 were out 2 weeks ago in nuget.org. All the up-to-date SD optimizations are there, so no need to install ort-nightly-directml anymore. In fact I would caution using the in-between releases nightly drop from main anytime b/c it could be very unstable.

For more details on how to optimize SD for ONNX and DirectML, check out this official sample. And, if you're already seeking best performance, why stop there? Go download the latest driver update for SD for NVDIA or AMD.

fdwr commented Jul 11, 2023 •

edited

Loading

PyTorch 2 did not work for me. So for anyone else seeing a bewildering torch.onnx.errors.UnsupportedOperatorError: Exporting the operator 'aten::scaled_dot_product_attention' to ONNX opset version 14 is not supported, use PyTorch 1.13, like OLive does in its requirements.txt.

Failed for me:

torch==2.0.1
diffusers==0.18.1
onnx==1.14.0
onnxruntime==1.15.0
onnxruntime-directml==1.15.0
numpy==1.24.3

Worked for me:

torch==1.13.1
diffusers==0.17.1   # 0.18.1 would probably work too, as the PyTorch version is the bigger factor
onnx==1.14.0
onnxruntime==1.15.0
onnxruntime-directml=1.15.0
numpy==1.21.6

wd021 commented Jul 10, 2025

love me some 🖼️ prompts! come share the best with God Tier Prompts! 🧠 ✨

averad/Stable_Diffusion.md

🤗 Stable Diffusion for AMD GPUs on Windows using DirectML

Requirements

Installation

Create a Folder to Store Stable Diffusion Related Files

Install 🤗 diffusers

Download the Stable Diffusion ONNX model

Scripts / Examples

Stable Diffusion Txt 2 Img on AMD GPUs

Stable Diffusion Img 2 Img on AMD GPUs

Stable Diffusion Inpainting on AMD GPUs

Example Txt2Img Script With More Features

Output

Convert Stable Diffusion model to ONNX format

Install wget for Windows

Convert Original Stable Diffusion to Diffusers (Ckpt File)

Convert Stable Diffusion Checkpoint to Onnx

Additional Tools (Optional)

Upscaling

Real-ESRGAN

RealSR ncnn Vulkan

SRMD ncnn Vulkan

Image Editing

ImageMagick

Photopea

FAQs

How can I clear cached models?

Can I download and install ort-nightly-directml instead of onnxruntime-directml?

How do you install or use diffrent models?

averad commented Nov 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamiecropley commented Nov 30, 2022

Uh oh!

JStrbg commented Dec 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

averad commented Dec 1, 2022

Uh oh!

JStrbg commented Dec 1, 2022

Uh oh!

crazyfox55 commented Dec 6, 2022

Uh oh!

AkshaySapra commented Dec 17, 2022

Uh oh!

Shanesan commented Dec 17, 2022

Uh oh!

Amblyopius commented Dec 17, 2022

Uh oh!

Shanesan commented Dec 17, 2022

Uh oh!

Amblyopius commented Dec 17, 2022

Uh oh!

catwiesel commented Dec 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m8ax commented Jan 15, 2023

Uh oh!

kadrim commented Jan 15, 2023

Uh oh!

catwiesel commented Jan 15, 2023 via email

Uh oh!

KrakaD commented Jan 24, 2023

Uh oh!

m8ax commented Jan 30, 2023

Uh oh!

KangbingZhao commented Feb 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Amblyopius commented Feb 20, 2023

Uh oh!

wchao1115 commented Jun 6, 2023

Uh oh!

fdwr commented Jul 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wd021 commented Jul 10, 2025

Uh oh!

averad commented Nov 28, 2022 •

edited

Loading

JStrbg commented Dec 1, 2022 •

edited

Loading

catwiesel commented Dec 19, 2022 •

edited

Loading

KangbingZhao commented Feb 20, 2023 •

edited

Loading

fdwr commented Jul 11, 2023 •

edited

Loading