pytorch/pytorch#94891 (comment) The pcie atomic issue should be fixed already. Currently i cannot verify.
- Install PyTorch with ROCm support
Following offical installation guide: https://pytorch.org/get-started/locally/#linux-installation
Choose [Stable] -> [Linux] -> [Pip] -> [Python] -> [ROCm], It should be something like:
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
Remember the ROCm version here.
- Install ROCm drivers
- Download the installation script, MUST BE THE SAME VERSION AS PYTORCH
https://repo.radeon.com/amdgpu-install/5.4.2/ubuntu/focal/
File name should be: amdgpu-install_[version]_all.deb - Install the deb package
dpkg -i ./amdgpu-install*.deb
- Run the installation script
amdgpu-install --usecase=graphics,rocm,opencl -y --accept-eula
Note: Ryzen 7 5825u iGPU architecture is Vega 8, which suppose to use legacy opencl.
If you are using other AMD GPU or APU, modifications may required.
- Add current user to groups
To access device /dev/kfd, /dev/dri/card0 and /dev/dri/renderD*, current user must be added to group render and video.
sudo usermod -a -G render $LOGNAME
sudo usermod -a -G video $LOGNAME
If not added, only root is allowed to use ROCm
- Reboot the system
- Add environment variables in .bashrc
Ryzen 7 5825u is gfx90c, should be compatible with gfx900. We force ROCm to treat it as gfx900.
export PYTORCH_ROCM_ARCH=gfx900
export HSA_OVERRIDE_GFX_VERSION=9.0.0
- Check iGPU status
rocm-smi
From the output, you can see GPU[0].
======================= ROCm System Management Interface =======================
================================= Concise Info =================================
ERROR: GPU[0] : sclk clock is unsupported
================================================================================
GPU[0] : Not supported on the given system
GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 43.0c 0.003W None 1200Mhz 0% auto Unsupported 43% 0%
================================================================================
============================= End of ROCm SMI Log ==============================
Also, you can check OpenCL status
clinfo
From the output you can see GPU has been detected.
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3513.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name:
Device Topology: PCI[ B#4, D#0, F#0 ]
Max compute units: 8
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
- Test run
import torch
print(torch.cuda.device_count())
for i in range(torch.cuda.device_count()):
print(torch.cuda.get_device_properties(i))
Output:
1
_CudaDeviceProperties(name='AMD Radeon Graphics', major=9, minor=0, total_memory=1024MB, multi_processor_count=8)
@warmonkey I hate windows OS , I daily drive Linux (Ubuntu latest). I hate windows because it's closed and very much less optimized for high performance tasks such as machine learning etc. I dont know why people at AMD do not understand that they need to support Linux more than windows because Linux is where all the serious programming happens not at windows. Windows is the OS for noobs basically not for pros.