Skip to content

Instantly share code, notes, and snippets.

@cgmb
Last active January 26, 2025 00:51
Show Gist options
  • Save cgmb/a74e63ab5b2727b076eca7cafeb65bd5 to your computer and use it in GitHub Desktop.
Save cgmb/a74e63ab5b2727b076eca7cafeb65bd5 to your computer and use it in GitHub Desktop.
Build llama.cpp on Ubuntu 24.04
#!/bin/sh
# Build llama.cpp on Ubuntu 24.04 with AMD GPU support
sudo apt -y install git wget hipcc libhipblas-dev librocblas-dev cmake build-essential
# ensure you have the necessary permissions by adding yourself to the video and render groups
sudo usermod -aG video,render $USER
# reboot to apply the group changes
# run rocminfo to check everything is working thus far
rocminfo
# if it printed information about your GPU, that means it's working
# if you see an error message, fix the problem before continuing
# download a model
wget --continue https://huggingface.co/TheBloke/dolphin-2.2.1-mistral-7B-GGUF/resolve/main/dolphin-2.2.1-mistral-7b.Q5_K_M.gguf?download=true -O dolphin-2.2.1-mistral-7b.Q5_K_M.gguf
# build llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
git checkout b3267
HIPCXX=clang++-17 cmake -H. -Bbuild -DGGML_HIPBLAS=ON -DCMAKE_BUILD_TYPE=Release
make -j16 -C build
# run llama.cpp
build/bin/llama-cli -ngl 32 --color -c 2048 --temp 0.7 --repeat_penalty 1.1 -n -1 -m ../dolphin-2.2.1-mistral-7b.Q5_K_M.gguf --prompt "Once upon a time"
@cgmb
Copy link
Author

cgmb commented Nov 24, 2024

If you wish to llama-cpp in a docker container, ensure devices are passed through:

docker run -it --device=/dev/dri \
               --device=/dev/kfd \
               --security-opt seccomp=unconfined \
               --group-add=video \
               --group-add=$(getent group render | cut -d: -f3) \
               ubuntu:noble

The $(getent group render | cut -d: -f3) is to add the render group by number, because the name will not exist within the container at launch.

The official docs for this can be found at https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#accessing-gpus-in-containers

@MarioIshac
Copy link

Thank you, this worked fantastic. I am curious on why -DAMDGPU_TARGETS=gfx{...} is not used compared to official guide
. What is the behavior, is it choosing whatever it detects?

@cgmb
Copy link
Author

cgmb commented Jan 26, 2025

@MarioIshac, the official guide is out of date. The -DAMDGPU_TARGETS flag only affects the hip::device target provided by find_package(hip). This is the mechanism you would use to choose the target if you were building with CXX=hipcc. However, llama-cpp switched to using CMake's built-in support for the HIP language, with HIPCXX=clang++ and enable_language(hip). The target selection for that mechanism would be controlled by -DCMAKE_HIP_ARCHITECTURES flag.

In the case that the AMDGPU_TARGETS is not specified, hipcc will detect your GPU and build for that target. In the case that CMAKE_HIP_ARCHITECTURES is not specified, cmake will detect your GPU and build for that target. As such, in the official guide you linked, the build uses HIPCXX and CMAKE_HIP_ARCHITECTURES is unset, so it will autodetect the architecture. Since hipcc is not used, the fact that it sets AMDGPU_TARGETS is irrelevant.

For end users, it's better for them to not specify the GPU architecture anyway. It's better to just let it autodetect, as then I don't need to train users on how to determine their GPU architecture for specifying it manually. Only power-users that are building binaries to distribute to other people really need to learn that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment