Skip to content

Instantly share code, notes, and snippets.

@In-line
Last active November 1, 2024 18:28
Show Gist options
  • Save In-line/c1225f05d5164a4be9b39de68e99ee2b to your computer and use it in GitHub Desktop.
Save In-line/c1225f05d5164a4be9b39de68e99ee2b to your computer and use it in GitHub Desktop.
AMD 7900 XTX Stable Diffusion Web UI docker container (ROCM 5.5_rc4)

To use this container you would first need to install docker and docker-compose.

After that you would need deleted rocm/composable_kernel:ck_ub20.04_rocm5.5_rc4 docker image (alpha build of ROCM). You can download the image, using bittorect magnet link in the files of the gist. After downloading the image load it with docker load --input rocm5.5.tar.gz

Running is as simple as: sudo docker-compose up --build. After that Stable Diffusion Web UI could be accessed in http://127.0.0.1:3000

Many thanks to @wsippel for putting up step by step guide to get it working.

Disclaimer

This is tested on kernel 6.2.10 and mesa 23.0.0

version: "3.9"
services:
sd:
build: .
ports:
- "3000:7860"
volumes:
- ./models:/SD/stable-diffusion-webui/models/
- ./repositories:/SD/stable-diffusion-webui/repositories/
- ./extensions:/SD/stable-diffusion-webui/extensions/
- ./outputs:/SD/stable-diffusion-webui/outputs/
devices:
- '/dev/kfd:/dev/kfd'
- '/dev/dri:/dev/dri'
security_opt:
- seccomp:unconfined
group_add:
- video
FROM rocm/composable_kernel:ck_ub20.04_rocm5.5_rc4
RUN mkdir /SD
# Clone SD
WORKDIR /SD
RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
WORKDIR /SD/stable-diffusion-webui
RUN git reset --hard 22bcc7be428c94e9408f589966c2040187245d81
RUN apt update && apt install python3.8-venv
RUN python3 -m venv venv
# Activate VENV
ENV VIRTUAL_ENV=/SD/stable-diffusion-webui/venv
RUN python3 -m venv $VIRTUAL_ENV
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
RUN python3 -m pip install --upgrade pip wheel
ENV HIP_VISIBLE_DEVICES=0
ENV PYTORCH_ROCM_ARCH="gfx1100"
ENV CMAKE_PREFIX_PATH=/SD/stable-diffusion-webui/venv/
ENV USE_CUDA=0
# Setup patched folder & compile dependencies
RUN mkdir -p /SD/stable-diffusion-webui/patched
RUN pip install cmake ninja
WORKDIR patched
# Remove old torch and torchvision
RUN pip uninstall -y torch torchvision
# Build pytorch
RUN wget https://github.com/pytorch/pytorch/releases/download/v2.0.0/pytorch-v2.0.0.tar.gz
RUN tar -xzvf pytorch-v2.0.0.tar.gz
WORKDIR /SD/stable-diffusion-webui/patched/pytorch-v2.0.0
RUN pip install -r requirements.txt
RUN pip install mkl mkl-include
RUN python3 tools/amd_build/build_amd.py
RUN python3 setup.py install
# Build vision
WORKDIR /SD/stable-diffusion-webui/patched/
RUN wget https://github.com/pytorch/vision/archive/refs/tags/v0.15.1.tar.gz
RUN tar -xzvf v0.15.1.tar.gz
WORKDIR /SD/stable-diffusion-webui/patched/vision-0.15.1
RUN python3 setup.py install
WORKDIR /SD/stable-diffusion-webui
# Patch requirements.txt to remove torch
RUN sed '/torch/d' requirements.txt
RUN pip install -r requirements.txt
EXPOSE 7860/tcp
# Fix for "detected dubious ownership in repository" by rom1win.
RUN git config --global --add safe.directory '*'
CMD python3 launch.py --listen --disable-safe-unpickle
@evshiron
Copy link

evshiron commented Jul 2, 2023

Fyi, I am getting 10-20% performance boost with ROCm 5.6 compared to 5.5.1.

I am using sd-extension-system-info to benchmark and the results can be found here.

It is appreciated if there will be more data for Navi 3x so we know where we are.

@rom1win
Copy link

rom1win commented Jul 3, 2023

@evshiron I'm currently using your rocm5.5.1-ub22.04-base image in my docker environment. Will you also release a 5.6 version ?

@BloodBlight
Copy link

@rom1win My gist is at 5.6:
https://gist.github.com/BloodBlight/0d36b33d215056395f34db26fb419a63

Seems to be working, but I am no expert.

@evshiron
Copy link

evshiron commented Jul 4, 2023

@rom1win

rocm5.6-ub22.04-base is built just now.

Please note that the Docker images in ROCm LAB are mainly used for conceptual verification. As ROCm gradually shifts its support for Navi 3x to performance optimization now, there won't be many updates.

The changes to update from rocm5.5.1-ub22.04-base to rocm5.6-ub22.04-base are minimal, but the storage space used for building exceeds the limit of the free runners on GitHub, making it more difficult to update the image.

It's recommended to obtain Dockerfiles from https://github.com/evshiron/rocm_lab/tree/master/dockerfiles, modify and build it according to your needs.

@BloodBlight

No worries. You have done a great job :) Maybe what he needs is a base image with only ROCm 5.6 installed, so I update it.

@rom1win
Copy link

rom1win commented Jul 4, 2023

@evshiron Thank you very much, I understand your issue with storage limit on Github :/ I will directly build using the dockerfile from now.

Also, I wanted to know if the rocm5.5.1-base image is stable on your side. On my side the a111-webui is really unstable and will just crash really often. I am monitoring the VRAM and RAM and I don't seem to be going over the limit (13GB/24 available). I didn't have this issue when my dockerfile used the rocm/composable_kernel:ck_ub20.04_rocm5.5_rc4 image.

@BloodBlight I saw your gist and wanted to try it but saw that it is based on ubuntu 20.04. I have some extensions that just doesn't work if I dont have python3.10 installed. I could fix it but for testing just updating my already existing Dockerfile with the rocm5.6-ub22.04-base is just the fastest.

@evshiron
Copy link

evshiron commented Jul 5, 2023

@rom1win

I actually don't use Docker much to run Stable Diffusion. This base image was built for testing when ROCm 5.5.1 released.

Based on your description of "unstable and crash" without specific logs, I tested it using this image and here are the steps (my host uses ROCm 5.6):

  1. Run the rocm_lab:rocm5.5.1-ub22.04-base image with parameters like this
  2. Run apt install git python3-pip python3-venv to install missing dependencies
  3. Run git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui to clone the a1111-webui repo
  4. Run pip install -r requirements.txt for a1111-webui's dependencies
  5. Run pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.5 for torch
  6. Run export HSA_OVERRIDE_GFX_VERSION=11.0.0 and export HIP_VISIBLE_DEVICES=0
  7. Run python3 launch.py --listen --enable-insecure-extension-access --opt-sdp-attention to launch
  8. Install the system-info extension and restart. The banchmarked performance is at the level of ROCm 5.5 (15it/s)
  9. Go to "txt2img" and generate a 512x512 image with 2x Hires. fix, and catch a HIP out of memory exception
  10. Run export PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.8,max_split_size_mb:512 and restart. Now 1024x1024 passes
  11. Catch a HIP out of memory exception when generating 768x768 with 2x Hires. fix
  12. Remove --opt-sdp-attention from the arguments and restart. Now 1536x1536 passes, but it's slow

I haven't observed any other crashes for now, and for generating large images, it is recommended to use Tiled VAE. If using Vlad's Automatic, the above optimizations are automatically enabled. Even without Tiled VAE, it can generate larger images. Don't use --no-half-vae.

@GianlucaMattei
Copy link

GianlucaMattei commented Jul 13, 2023

cant understand what is the straightforward way to run SD on Linux (22.04) with my 7900XT ....I am new to docker and from what I can read the tutorial above is no more updated (and it returns out of memory issues to me btw)

These are the steps I did:

# I installed rocm5.5 by 
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.5

# I downloaded SD and I installed dependencies
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
sudo apt install git python3-pip python3-venv
cd stable-diffusion-webui
pip install -r requirements.txt

# then I did something I cant understand .....
export HSA_OVERRIDE_GFX_VERSION=11.0.0
export HIP_VISIBLE_DEVICES=0
export PYTORCH_ROCM_ARCH="gfx1100"

# finally 
python3 launch.py --listen --enable-insecure-extension-access --opt-sdp-attention

Python 3.10.9 (main, Mar  8 2023, 10:47:38) [GCC 11.2.0]
Version: v1.4.1
Commit hash: f865d3e11647dfd6c7b2cdf90dde24680e58acd8
Traceback (most recent call last):
  File "/home/gm/stable-diffusion-webui/launch.py", line 38, in <module>
    main()
  File "/home/gm/stable-diffusion-webui/launch.py", line 29, in main
    prepare_environment()
  File "/home/gm/stable-diffusion-webui/modules/launch_utils.py", line 268, in prepare_environment
    raise RuntimeError(
RuntimeError: Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check



# thus I tried
python3 launch.py --listen --enable-insecure-extension-access --opt-sdp-attention --skip-torch-cuda-test

Python 3.10.9 (main, Mar  8 2023, 10:47:38) [GCC 11.2.0]
Version: v1.4.1
Commit hash: f865d3e11647dfd6c7b2cdf90dde24680e58acd8
Installing clip
Installing open_clip
Cloning Stable Diffusion into /home/gm/stable-diffusion-webui/repositories/stable-diffusion-stability-ai...
Cloning K-diffusion into /home/gm/stable-diffusion-webui/repositories/k-diffusion...
Cloning CodeFormer into /home/gm/stable-diffusion-webui/repositories/CodeFormer...
Cloning BLIP into /home/gmattei/gm-diffusion-webui/repositories/BLIP...
Installing requirements for CodeFormer
Installing requirements
Launching Web UI with arguments: --listen --enable-insecure-extension-access --opt-sdp-attention --skip-torch-cuda-test
Traceback (most recent call last):
  File "/home/gm/stable-diffusion-webui/launch.py", line 38, in <module>
    main()
  File "/home/gm/stable-diffusion-webui/launch.py", line 34, in main
    start()
  File "/home/gm/stable-diffusion-webui/modules/launch_utils.py", line 340, in start
    import webui
  File "/home/gm/stable-diffusion-webui/webui.py", line 23, in <module>
    from modules import paths, timer, import_hook, errors, devices  # noqa: F401
  File "/home/gm/stable-diffusion-webui/modules/paths.py", line 5, in <module>
    import modules.safe  # noqa: F401
  File "/home/gm/stable-diffusion-webui/modules/safe.py", line 6, in <module>
    import torch
  File "/home/gm/anaconda3/envs/draw/lib/python3.10/site-packages/torch/__init__.py", line 229, in <module>
    from torch._C import *  # noqa: F403
ImportError: libroctx64.so.4: cannot open shared object file: No such file or directory


# so lets try with ( cant still understand the differences between webui.sh and launch.py ... )
bash webui.sh  --listen --enable-insecure-extension-access --opt-sdp-attention --skip-torch-cuda-test

# when I load a model or click generate:
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'

Whats wrong between AMD and SD ???

EDIT:
one more thing, should I revert this step
Run export HSA_OVERRIDE_GFX_VERSION=11.0.0 and export HIP_VISIBLE_DEVICES=0
In order to avoid further problems ?

@evshiron
Copy link

evshiron commented Jul 13, 2023

@GianlucaMattei

Did you install ROCm via amdgpu-install beforehand?

curl -O https://repo.radeon.com/amdgpu-install/5.6/ubuntu/jammy/amdgpu-install_5.6.50600-1_all.deb
sudo dpkg -i amdgpu-install_5.6.50600-1_all.deb
# opencl might cause issues later, so don't add opencl unless you know what you are doing
sudo amdgpu-install --usecase=graphics,rocm
sudo reboot

If ROCm is installed, can you run rocminfo and rocm-smi and check the printed logs?
Both commands should exist and work if ROCm is correctly installed, and you can find your RX 7900 XT in the log.

export HSA_OVERRIDE_GFX_VERSION=11.0.0 makes every GPU recognized as RX 7900 XT/XTX.
It's recommended for generally every application running with Navi 31 (not sure about other Navi 3x but should work too).
This should not cause problems on Navi 31, but if you are concerned you can remove it.
stable-diffusion-webui might work without it, but text-generation-webui needs both of them if I remember correctly.

export HIP_VISIBLE_DEVICES=0 makes only the first GPU visible to the application.
This avoid the application using unsupported iGPU and is recommended too.
If your RX 7900 XT is not the first GPU listed in rocminfo, you should change the number accordingly.

@GianlucaMattei
Copy link

@evshiron
I had an old version.
I installed the new one but it still doesnt work (RuntimeError: "LayerNormKernelImpl" not implemented for 'Half')

python3 launch.py --listen --enable-insecure-extension-access --opt-sdp-attention --skip-torch-cuda-test
Python 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
Version: v1.4.1
Commit hash: f865d3e11647dfd6c7b2cdf90dde24680e58acd8
Installing gfpgan
Installing clip
Installing open_clip
Installing requirements for CodeFormer
Installing requirements
Launching Web UI with arguments: --listen --enable-insecure-extension-access --opt-sdp-attention --skip-torch-cuda-test
/usr/lib/python3/dist-packages/pkg_resources/__init__.py:116: PkgResourcesDeprecationWarning: 1.16.0-unknown is an invalid version and will not be supported in a future release
  warnings.warn(
/usr/lib/python3/dist-packages/pkg_resources/__init__.py:116: PkgResourcesDeprecationWarning: 1.1build1 is an invalid version and will not be supported in a future release
  warnings.warn(
/usr/lib/python3/dist-packages/pkg_resources/__init__.py:116: PkgResourcesDeprecationWarning: 0.1.43ubuntu1 is an invalid version and will not be supported in a future release
  warnings.warn(
No module 'xformers'. Proceeding without it.
Warning: caught exception 'Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx', memory monitor disabled
Loading weights [a0d9fe1383] from /home/gm/stable-diffusion-webui/models/Stable-diffusion/thisisreal_v20.safetensors
Exception in thread Thread-3 (first_time_calculation):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/gm/stable-diffusion-webui/modules/devices.py", line 170, in first_time_calculation
    linear(x)
  File "/home/gm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gm/stable-diffusion-webui/extensions-builtin/Lora/lora.py", line 400, in lora_Linear_forward
    return torch.nn.Linear_forward_before_lora(self, input)
  File "/home/gm/.local/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
preload_extensions_git_metadata for 7 extensions took 0.00s
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 3.2s (import torch: 0.7s, import gradio: 0.4s, import ldm: 0.2s, other imports: 0.7s, setup codeformer: 0.4s, load scripts: 0.2s, create ui: 0.4s, gradio launch: 0.2s).
Creating model from config: /home/gm/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying attention optimization: sdp... done.
Textual inversion embeddings loaded(0): 
loading stable diffusion model: RuntimeError
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/gm/stable-diffusion-webui/webui.py", line 306, in load_model
    shared.sd_model  # noqa: B018
  File "/home/gm/stable-diffusion-webui/modules/shared.py", line 726, in sd_model
    return modules.sd_models.model_data.get_sd_model()
  File "/home/gm/stable-diffusion-webui/modules/sd_models.py", line 422, in get_sd_model
    load_model()
  File "/home/gm/stable-diffusion-webui/modules/sd_models.py", line 510, in load_model
    sd_model.cond_stage_model_empty_prompt = sd_model.cond_stage_model([""])
  File "/home/gm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gm/stable-diffusion-webui/modules/sd_hijack_clip.py", line 229, in forward
    z = self.process_tokens(tokens, multipliers)
  File "/home/gm/stable-diffusion-webui/modules/sd_hijack_clip.py", line 254, in process_tokens
    z = self.encode_with_transformers(tokens)
  File "/home/gm/stable-diffusion-webui/modules/sd_hijack_clip.py", line 302, in encode_with_transformers
    outputs = self.wrapped.transformer(input_ids=tokens, output_hidden_states=-opts.CLIP_stop_at_last_layers)
  File "/home/gm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gm/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 811, in forward
    return self.text_model(
  File "/home/gm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gm/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 721, in forward
    encoder_outputs = self.encoder(
  File "/home/gm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gm/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 650, in forward
    layer_outputs = encoder_layer(
  File "/home/gm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gm/.local/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 378, in forward
    hidden_states = self.layer_norm1(hidden_states)
  File "/home/gm/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/gm/.local/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 190, in forward
    return F.layer_norm(
  File "/home/gm/.local/lib/python3.10/site-packages/torch/nn/functional.py", line 2515, in layer_norm
    return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'


Stable diffusion model failed to load

rocminfo
This is the output:

ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    12th Gen Intel(R) Core(TM) i5-12400F
  Uuid:                    CPU-XX                             
  Marketing Name:          12th Gen Intel(R) Core(TM) i5-12400F
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      49152(0xc000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   5600                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            12                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    16174776(0xf6ceb8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    16174776(0xf6ceb8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16174776(0xf6ceb8) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1100                            
  Uuid:                    GPU-5b4cb5dc00000000               
  Marketing Name:          Radeon RX 7900 XT                  
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      32(0x20) KB                        
    L2:                      6144(0x1800) KB                    
    L3:                      81920(0x14000) KB                  
  Chip ID:                 29772(0x744c)                      
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2075                               
  BDFID:                   768                                
  Internal Node ID:        1                                  
  Compute Unit:            84                                 
  SIMDs per CU:            2                                  
  Shader Engines:          6                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    20955136(0x13fc000) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1100         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***           

For rocm-smi

========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
GPU  Temp (DieEdge)  AvgPwr  SCLK   MCLK   Fan  Perf  PwrCap  VRAM%  GPU%  
0    41.0c           16.0W   32Mhz  96Mhz  0%   auto  265.0W    2%   2%    
====================================================================================
=============================== End of ROCm SMI Log ================================

@evshiron
Copy link

evshiron commented Jul 14, 2023

@GianlucaMattei

Hmmm, the outputs of rocminfo and rocm-smi look good.

Can you remove --skip-torch-cuda-test and launch again? When it fails, try sudo dmesg and see if there are any abnormal logs from amdgpu.

Also, if you have Docker daemon (not Docker Desktop) installed, would you mind running:

docker run -ti --net=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G -e HSA_OVERRIDE_GFX_VERSION=11.0.0 --name rocm5.5-automatic ghcr.io/evshiron/rocm_lab:rocm5.5-automatic

This is a working example weeks ago. If it doesn't work, can you post the logs here?

@GianlucaMattei
Copy link

GianlucaMattei commented Jul 16, 2023

Ok, this is the log from amdgpu you were looking for (I think)
It seems there are a lot of problems ....

sudo dmesg | grep amdgpu

[    1.803496] [drm] amdgpu kernel modesetting enabled.
[    1.803498] [drm] amdgpu version: 6.1.5
[    1.803541] amdgpu: CRAT table not found
[    1.803542] amdgpu: Virtual CRAT table created for CPU
[    1.803547] amdgpu: Topology: Add CPU node
[    1.858931] amdgpu: PeerDirect support was initialized successfully
[    1.858996] amdgpu 0000:03:00.0: enabling device (0006 -> 0007)
[    1.861612] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from VFCT
[    1.861615] amdgpu: ATOM BIOS: 113-D70401XT-P10
[    1.862014] amdgpu 0000:03:00.0: amdgpu: CP RS64 enable
[    1.862331] amdgpu 0000:03:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
[    1.862976] amdgpu 0000:03:00.0: vgaarb: deactivate vga console
[    1.862980] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    1.863034] amdgpu 0000:03:00.0: amdgpu: MEM ECC is not presented.
[    1.863036] amdgpu 0000:03:00.0: amdgpu: SRAM ECC is not presented.
[    1.863112] amdgpu 0000:03:00.0: BAR 2: releasing [mem 0x4010000000-0x40101fffff 64bit pref]
[    1.863116] amdgpu 0000:03:00.0: BAR 0: releasing [mem 0x4000000000-0x400fffffff 64bit pref]
[    1.863161] amdgpu 0000:03:00.0: BAR 0: assigned [mem 0x4800000000-0x4fffffffff 64bit pref]
[    1.863173] amdgpu 0000:03:00.0: BAR 2: assigned [mem 0x4400000000-0x44001fffff 64bit pref]
[    1.863250] amdgpu 0000:03:00.0: amdgpu: VRAM: 20464M 0x0000008000000000 - 0x00000084FEFFFFFF (20464M used)
[    1.863254] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    1.863256] amdgpu 0000:03:00.0: amdgpu: AGP: 267878400M 0x0000008800000000 - 0x0000FFFFFFFFFFFF
[    1.863333] [drm] amdgpu: 20464M of VRAM memory ready
[    1.863335] [drm] amdgpu: 7897M of GTT memory ready.
[    1.864233] amdgpu 0000:03:00.0: amdgpu: Will use PSP to load VCN firmware
[    2.112892] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
[    2.112898] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[    2.112928] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x00000037, smu fw if version = 0x0000003d, smu fw program = 0, smu fw version = 0x004e5800 (78.88.0)
[    2.112934] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
[    2.271382] amdgpu 0000:03:00.0: amdgpu: SMU is initialized successfully!
[    2.477319] amdgpu 0000:03:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[    2.478964] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[    2.478983] amdgpu: sdma_bitmap: fff0
[    2.546219] amdgpu: HMM registered 20464MB device memory
[    2.546228] amdgpu: SRAT table not found
[    2.546228] amdgpu: Virtual CRAT table created for GPU
[    2.546347] amdgpu: Topology: Add dGPU node [0x744c:0x1002]
[    2.546349] kfd kfd: amdgpu: added device 1002:744c
[    2.546360] amdgpu 0000:03:00.0: amdgpu: SE 6, SH per SE 2, CU per SH 8, active_cu_number 84
[    2.546419] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[    2.546420] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    2.546421] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    2.546421] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[    2.546422] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[    2.546422] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[    2.546423] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[    2.546423] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[    2.546424] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[    2.546424] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[    2.546425] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[    2.546425] amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 1
[    2.546426] amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 1
[    2.546426] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 4 on hub 1
[    2.546427] amdgpu 0000:03:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
[    2.548877] amdgpu: legacy kernel without apple_gmux_detect()
[    2.548990] amdgpu 0000:03:00.0: amdgpu: Using BACO for runtime pm
[    2.549204] [drm] Initialized amdgpu 3.53.0 20150101 for 0000:03:00.0 on minor 0
[    2.557099] fbcon: amdgpudrmfb (fb0) is primary device
[    2.557101] amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[    4.994714] snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])

Running :

docker run -ti --net=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G -e HSA_OVERRIDE_GFX_VERSION=11.0.0 --name rocm5.5-automatic ghcr.io/evshiron/rocm_lab:rocm5.5-automatic

I get

docker: unknown server OS: .
See 'docker run --help'.

But I dont know if I have Docker daemon or Docker desktop ... I did know there were two Dockers at all....

Edit:

I run the last command adding sudo (Docker commands work only as sudoer for me...)

sudo docker run -ti --net=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G -e HSA_OVERRIDE_GFX_VERSION=11.0.0 --name rocm5.5-automatic ghcr.io/evshiron/rocm_lab:rocm5.5-automatic
docker: Error response from daemon: Conflict. The container name "/rocm5.5-automatic" is already in use by container "1440832e215a029b3886b26c055055a220ada45e2c9cd5c79048f4844ea89d5c". You have to remove (or rename) that container to be able to reuse that name.
See 'docker run --help'.

Thus I did

sudo docker remove 1440832e215a029b3886b26c055055a220ada45e2c9cd5c79048f4844ea89d5c

then

sudo docker run -ti --net=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G -e HSA_OVERRIDE_GFX_VERSION=11.0.0 --name rocm5.5-automatic ghcr.io/evshiron/rocm_lab:rocm5.5-automatic

works, even if it produces very poor images

BTW:
now launching

python3 launch.py --listen --enable-insecure-extension-access --opt-sdp-attention --skip-torch-cuda-test

returns (I'll write only the last line of the output)

OSError: libtorch_hip.so: cannot open shared object file: No such file or directory

Sorry for writing a book but outputs are very long...

@evshiron
Copy link

evshiron commented Jul 17, 2023

@GianlucaMattei

Ok, this is the log from amdgpu you were looking for (I think)

The dmesg logs you posted are normal initialization logs when the system booted, and it looks OK.

works, even if it produces very poor images

So it generates. For image quality, you can choose various models and settings once it's settled.

It should be an installation issue. Let's start from scratch:

# set up groups if it hasn't been done previously, reboot is needed
sudo usermod -aG video $USER
sudo usermod -aG render $USER

mkdir test
cd test

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui

# set up venv
python3 -m venv venv
source venv/bin/activate

# install dependencies
pip3 install -r requirements.txt
# replace torch with the rocm one
pip3 uninstall torch torchvision
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.5

# download a launch script from gist, you can create it yourself if you are concerned
curl https://gist.githubusercontent.com/evshiron/8cf4de34aa01e217ce178b8ed54a2c43/raw/e5743505afe6b2a329908bbefda93d98b98940ac/launch.sh > launch.sh

# launch the webui
bash launch.sh
launch.log
(venv) user@hostname:~/test/stable-diffusion-webui$ bash launch.sh 
Python 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
Version: v1.4.1
Commit hash: f865d3e11647dfd6c7b2cdf90dde24680e58acd8
Installing clip
Installing open_clip
Cloning Stable Diffusion into /home/user/test/stable-diffusion-webui/repositories/stable-diffusion-stability-ai...
Cloning K-diffusion into /home/user/test/stable-diffusion-webui/repositories/k-diffusion...
Cloning CodeFormer into /home/user/test/stable-diffusion-webui/repositories/CodeFormer...
Cloning BLIP into /home/user/test/stable-diffusion-webui/repositories/BLIP...
Installing requirements for CodeFormer
Installing requirements
Launching Web UI with arguments: --listen --enable-insecure-extension-access --opt-sdp-attention
No module 'xformers'. Proceeding without it.
Downloading: "https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors" to /home/user/test/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors

100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 3.97G/3.97G [06:22<00:00, 11.2MB/s]
Calculating sha256 for /home/user/test/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors: preload_extensions_git_metadata for 7 extensions took 0.00s
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 385.1s (import torch: 0.7s, import gradio: 0.5s, import ldm: 0.2s, other imports: 0.3s, list SD models: 382.8s, load scripts: 0.2s, create ui: 0.2s).
6ce0161689b3853acaa03779ec93eafe75a02f4ced659bee03f50797806fa2fa
Loading weights [6ce0161689] from /home/user/test/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors
Creating model from config: /home/user/test/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying attention optimization: sdp... done.
Textual inversion embeddings loaded(0): 
Model loaded in 3.6s (calculate hash: 2.4s, load weights from disk: 0.1s, create model: 0.2s, apply weights to model: 0.2s, apply half(): 0.2s, move model to device: 0.2s, calculate empty prompt: 0.1s).

Now open http://127.0.0.1:7860, input river in the "Prompt" box, click the "Generate" button and it should generate an image of "river".

If it doesn't work, try the above steps with a new user, or with a fresh Ubuntu installation. Post logs if issues remain.

Extra info can be found here.

@Sega999
Copy link

Sega999 commented Jul 27, 2023

I cant get it to build I keep getting a python error

`(vvv) system-name:~/dockstab$ sudo docker-compose up --build
Building sd
Sending build context to Docker daemon 18.7GB
Step 1/38 : FROM rocm/composable_kernel:ck_ub20.04_rocm5.5_rc4
---> 3593424bfc2d
Step 2/38 : RUN mkdir /SD
---> Using cache
---> 8ed3bb34be5b
Step 3/38 : WORKDIR /SD
---> Using cache
---> 0e8b06d6e215
Step 4/38 : RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
---> Using cache
---> 0157c649186d
Step 5/38 : WORKDIR /SD/stable-diffusion-webui
---> Using cache
---> af2b8fdbdcf9
Step 6/38 : RUN git reset --hard 22bcc7be428c94e9408f589966c2040187245d81
---> Using cache
---> 417df67433a2
Step 7/38 : RUN apt update && apt install python3.8-venv
---> Running in d0dc73637440

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Err:2 http://compute-artifactory.amd.com/artifactory/list/rocm-release-archive-20.04-deb 5.5 InRelease
Something wicked happened resolving 'compute-artifactory.amd.com:http' (-5 - No address associated with hostname)
Err:3 http://artifactory-cdn.amd.com/artifactory/list/amdgpu-deb-remote focal/builds InRelease
Something wicked happened resolving 'artifactory-cdn.amd.com:http' (-5 - No address associated with hostname)
Get:4 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:5 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
Get:1 http://mirrors.edge.kernel.org/ubuntu focal InRelease [265 kB]
Get:6 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [2597 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:8 http://mirrors.edge.kernel.org/ubuntu focal/universe amd64 Packages [11.3 MB]
Get:9 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Get:10 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
Get:11 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]
Get:12 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
Get:13 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2909 kB]
Get:14 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [1083 kB]
Get:15 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [29.3 kB]
Get:16 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
Get:17 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [32.0 kB]
Get:18 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [2738 kB]
Get:19 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [3390 kB]
Get:20 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1383 kB]
Get:21 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [28.6 kB]
Get:22 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [55.2 kB]
Get:23 http://mirrors.edge.kernel.org/ubuntu focal/main amd64 Packages [1275 kB]
Fetched 40.6 MB in 48s (837 kB/s)
Reading package lists...
Building dependency tree...
Reading state information...
W: Failed to fetch http://artifactory-cdn.amd.com/artifactory/list/amdgpu-deb-remote/dists/focal/builds/InRelease Something wicked happened resolving 'artifactory-cdn.amd.com:http' (-5 - No address associated with hostname)
W: Failed to fetch http://compute-artifactory.amd.com/artifactory/list/rocm-release-archive-20.04-deb/dists/5.5/InRelease Something wicked happened resolving 'compute-artifactory.amd.com:http' (-5 - No address associated with hostname)
W: Some index files failed to download. They have been ignored, or old ones used instead.
87 packages can be upgraded. Run 'apt list --upgradable' to see them.

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

Reading package lists...
Building dependency tree...
Reading state information...
The following additional packages will be installed:
libpython3.8 libpython3.8-dev libpython3.8-minimal libpython3.8-stdlib
python3.8 python3.8-dev python3.8-minimal
Suggested packages:
python3.8-doc
The following NEW packages will be installed:
python3.8-venv
The following packages will be upgraded:
libpython3.8 libpython3.8-dev libpython3.8-minimal libpython3.8-stdlib
python3.8 python3.8-dev python3.8-minimal
7 upgraded, 1 newly installed, 0 to remove and 80 not upgraded.
Need to get 10.8 MB of archives.
After this operation, 26.6 kB of additional disk space will be used.
Do you want to continue? [Y/n] Abort.
The command '/bin/sh -c apt update && apt install python3.8-venv' returned a non-zero code: 1
ERROR: Service 'sd' failed to build : Build failed
`

@evshiron
Copy link

evshiron commented Jul 27, 2023

Change RUN apt update && apt install python3.8-venv to RUN apt update && apt install -y python3.8-venv.

Btw, the tutorial is outdated as of now. If you aren't a big fan of Docker containers, an easy guide can be found here:

Which is a compilation of comments above.

@Sega999
Copy link

Sega999 commented Jul 27, 2023

thank you

@f1am3d
Copy link

f1am3d commented Aug 28, 2023

@evshiron thanks! This is working. 👍

@f1am3d
Copy link

f1am3d commented Aug 28, 2023

Failed on step #21.

@f1am3d
Copy link

f1am3d commented Aug 28, 2023

I cant get it to build I keep getting a python error

The command '/bin/sh -c apt update && apt install python3.8-venv' returned a non-zero code: 1 ERROR: Service 'sd' failed to build : Build failed

Shame here. Hey author, did you test it at all? @In-line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment