pytorch/pytorch#94891 (comment) The pcie atomic issue should be fixed already. Currently i cannot verify.
- Install PyTorch with ROCm support
Following offical installation guide: https://pytorch.org/get-started/locally/#linux-installation
Choose [Stable] -> [Linux] -> [Pip] -> [Python] -> [ROCm], It should be something like:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
Remember the ROCm version here.
- Install ROCm drivers
- Download the installation script, MUST BE THE SAME VERSION AS PYTORCH
https://repo.radeon.com/amdgpu-install/5.4.2/ubuntu/focal/
File name should be: amdgpu-install_[version]_all.deb - Install the deb package
dpkg -i ./amdgpu-install*.deb
- Run the installation script
amdgpu-install --usecase=graphics,rocm,opencl -y --accept-eula
Note: Ryzen 7 5825u iGPU architecture is Vega 8, which suppose to use legacy opencl.
If you are using other AMD GPU or APU, modifications may required.
- Add current user to groups
To access device /dev/kfd, /dev/dri/card0 and /dev/dri/renderD*, current user must be added to group render and video.
sudo usermod -a -G render $LOGNAME
sudo usermod -a -G video $LOGNAME
If not added, only root is allowed to use ROCm
- Reboot the system
- Add environment variables in .bashrc
Ryzen 7 5825u is gfx90c, should be compatible with gfx900. We force ROCm to treat it as gfx900.
export PYTORCH_ROCM_ARCH=gfx900
export HSA_OVERRIDE_GFX_VERSION=9.0.0
- Check iGPU status
rocm-smi
From the output, you can see GPU[0].
======================= ROCm System Management Interface =======================
================================= Concise Info =================================
ERROR: GPU[0] : sclk clock is unsupported
================================================================================
GPU[0] : Not supported on the given system
GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 43.0c 0.003W None 1200Mhz 0% auto Unsupported 43% 0%
================================================================================
============================= End of ROCm SMI Log ==============================
Also, you can check OpenCL status
clinfo
From the output you can see GPU has been detected.
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3513.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name:
Device Topology: PCI[ B#4, D#0, F#0 ]
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
- Test run
import torch
print(torch.cuda.device_count())
for i in range(torch.cuda.device_count()):
print(torch.cuda.get_device_properties(i))Output:
1
_CudaDeviceProperties(name='AMD Radeon Graphics', major=9, minor=0, total_memory=1024MB, multi_processor_count=8)
@warmonkey Hey man, thank for the gist
For me transformers(pytorch) is loading on gfx90c using both 5.7 or 6.4 ROCm, but it is very unstable and I cannot find the reason why.
Problem happens on model loading.
Sometimes it is ok and works as expected, but sometimes it just freezes display.
Mouse is working, sound is working, but display is frozen, cannot switch to tty as well.
I'm using Ubuntu 24.04, Asus b450 prime plus, Ryzen 5 5600G with Cezanne Radeon mobile (gfx90c).
I added 16GB of VRAM in bios.
Initially I thought that it might be the issue with insufficient VRAM for model, but no matter what value of VRAM for GPU I will set in bios, pytorch freezes display anyway from time to time...
When model is loaded successfully, GPU gives decent results -> 7-8 times faster CPU.
But this instability makes it not workable.
However, if model finally successfully loaded, then it works stable.
I tried different transformers loading model configurations, but found nothing till now.
What do you think about it?