Skip to content

Instantly share code, notes, and snippets.

@cgmb
Last active April 18, 2025 01:57
Show Gist options
  • Save cgmb/6ae0d118bf357fc4576a7568b85e1c45 to your computer and use it in GitHub Desktop.
Save cgmb/6ae0d118bf357fc4576a7568b85e1c45 to your computer and use it in GitHub Desktop.
Setup ROCm on a G4ad instance

How to setup ROCm 5.4.3 on an Ubuntu 22.04 G4ad instance

Install ROCm 5.4.3

sudo apt-get -y update
sudo apt-get -y upgrade
sudo apt-get -y install linux-modules-extra-aws
wget https://repo.radeon.com/amdgpu-install/5.4.3/ubuntu/jammy/amdgpu-install_5.4.50403-1_all.deb
sudo apt-get -y install ./amdgpu-install_5.4.50403-1_all.deb
sudo amdgpu-install --usecase=rocmdev
sudo usermod -a -G video,render ubuntu
sudo reboot

You can verify that this installation has been successful by running rocminfo and checking that gfx1011 is listed as one of the agents.

Build and run a sample program

apt-get install -y cmake build-essential g++-12
CXX=/opt/rocm/bin/hipcc cmake -S. -Bbuild -DAMDGPU_TARGETS=gfx1011 -DCMAKE_PREFIX_PATH=/opt/rocm
make -C build
./build/example

Note that AMD does not build the ROCm math libraries for gfx1011 when preparing their official packages, so if you want to use a library like rocSPARSE (whether directly in C++ or indirectly through a framework like PyTorch), you will have to build it from source yourself. The Spack package manager supports building for specific amdgpu architectures and is a useful tool for this purpose.

cmake_minimum_required(VERSION 3.16)
project(example LANGUAGES CXX)
find_package(hip REQUIRED)
add_executable(example main.cpp)
target_link_libraries(example PRIVATE hip::device)
#include <stdio.h>
#include <stdlib.h>
#include <hip/hip_runtime.h>
#define CHECK_HIP(expr) do { \
hipError_t result = (expr); \
if (result != hipSuccess) { \
fprintf(stderr, "%s:%d: %s (%d)\n", \
__FILE__, __LINE__, \
hipGetErrorString(result), result); \
exit(EXIT_FAILURE); \
} \
} while(0)
__global__ void sq_arr(float *arr, int n) {
int tid = blockDim.x*blockIdx.x + threadIdx.x;
if (tid < n) {
arr[tid] = arr[tid] * arr[tid];
}
}
int main() {
enum { N = 5 };
float hArr[N] = { 1, 2, 3, 4, 5 };
float *dArr;
CHECK_HIP(hipMalloc(&dArr, sizeof(float) * N));
CHECK_HIP(hipMemcpy(dArr, hArr, sizeof(float) * N, hipMemcpyHostToDevice));
sq_arr<<<dim3(1), dim3(32,1,1), 0, 0>>>(dArr, N);
CHECK_HIP(hipMemcpy(hArr, dArr, sizeof(float) * N, hipMemcpyDeviceToHost));
for (int i = 0; i < N; ++i) {
printf("%f\n", hArr[i]);
}
CHECK_HIP(hipFree(dArr));
return 0;
}
@jcao-ai
Copy link

jcao-ai commented May 11, 2023

Hi, I follow all the instruction here on the same g4ad instance with V520 GPU. CMakeList.txt compiles but when it crashes with segment fault. Have you ever faced the same problem?

Hint from gdb:
Thread 1 "example" received signal SIGSEGV, Segmentation fault. 0x00007ffff64db335 in ?? () from /opt/rocm/hip/lib/libamdhip64.so.5

which is caused by sq_arr<<<dim3(1), dim3(32,1,1), 0, 0>>>(dArr, N);

@cgmb
Copy link
Author

cgmb commented May 11, 2023

Opps. Thanks for pointing that out, @LitleCarl. There's a bug in the GPU architecture autodetection when using hip::device. These instructions were originally written for Debian (where the bug has already been patched), and I forgot to explicitly specify the GPU architecture when I adapted the instructions for Ubuntu.

I've updated the build command for the sample program. I hope that helps.

@jcao-ai
Copy link

jcao-ai commented May 12, 2023

Opps. Thanks for pointing that out, @LitleCarl. There's a bug in the GPU architecture autodetection when using hip::device. These instructions were originally written for Debian (where the bug has already been patched), and I forgot to explicitly specify the GPU architecture when I adapted the instructions for Ubuntu.

I've updated the build command for the sample program. I hope that helps.

Works like a charm. Thanks. BTW have you ever tried building rocBLAS from source for gfx1011 ? As for ROCm 5.4.3 tag branch, I don't have luck to work it out. @cgmb

@jcao-ai
Copy link

jcao-ai commented May 12, 2023

Well, I work it out eventually. It's due to the change in Tensile library which leads to inconsistency:

https://github.com/ROCmSoftwarePlatform/Tensile/blob/aba52fa129099cd7c32b322f5daa1a586ad0792b/Tensile/TensileCreateLibrary.py#L1082

Because gfx1011 (AKA Navi12) only got fallback yaml files in logic_files, with SeparateArchitectures enabled, all of the logic files will be discarded.

TL;DR: build with cmd './install.sh -d -a gfx1011 --merge-architectures'

@ggouaillardet
Copy link

Thanks for the tip, it would be much easier for me if I can use ROCm on AWS!

Did you get any chance to try more recent ROCm versions such as 6.3 or event 6.4.0?

@cgmb
Copy link
Author

cgmb commented Apr 18, 2025

I have not tried the latest versions. I would expect it probably works, but I don't know for sure.

I will note that the packages for the ROCm math libraries in Ubuntu 24.04 'universe' and Debian 13 'main' repositories will run on gfx1011 without having to be rebuilt (as shown in Debian's Supported GPU List). Unfortunately, the packages are a bit old. They are mostly based on ROCm 5.5 or 5.7.

On AWS, even when using Ubuntu 24.04 or Debian 13 packages, you will need to install the amdgpu-dkms driver. This seems to be because the AWS built their images with the amdkfd driver disabled. Anyway, once you have the driver installed, you can just apt install librocblas-dev libhipblas-dev rather than building the libraries from source. I ran the test suites for all the distro-provided ROCm packages on an XFX BC-160 and the results seem fine.

@ggouaillardet
Copy link

Thanks for the quick reply!

I will give it a shot once I can get access to such an instance, and share my findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment