The following was tested on Google GCP utilizing an a2-highgpu-1g
instance and Rocky Linux 9 image (GCP optimized x86_64).
It has 192 RAM, 48 vCPU cores, and 4 Nvidia L4 24GB GPU attached in grid.
Recommended taking 384GB SSD disk or larger depending on the workload type.
NOTICE: Make sure you have positive bank balance before trying.
Update the system:
sudo dnf update -y
Install my favorite editor:
sudo dnf install -y nano
Install some basic development tools:
sudo dnf groupinstall -y "Development Tools"
sudo dnf install -y python3-pip
Next you need to install drivers for your GPU. I am ofcourse using NVIDIA A100-SXM4-40GB
but this should work almost for any decent datacenter GPU (NVIDIA).
Add EL9 compatible repository (Fedora):
sudo dnf config-manager --set-enabled crb
sudo dnf install -y \
https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm \
https://dl.fedoraproject.org/pub/epel/epel-next-release-latest-9.noarch.rpm
sudo dnf config-manager --add-repo \
http://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(uname -i)/cuda-rhel9.repo
Install some monitoring and execution tools:
sudo dnf install -y htop tmux
Install driver dependencies:
sudo dnf install -y \
kernel-headers-$(uname -r) kernel-devel-$(uname -r) \
tar bzip2 make automake gcc gcc-c++ \
pciutils elfutils-libelf-devel libglvnd-opengl libglvnd-glx libglvnd-devel acpid pkgconfig dkms
Install NVIDIA GPU driver:
sudo dnf module install -y nvidia-driver:latest-dkms
Now it's a good time to reboot the system:
reboot
Check the driver installation worked:
nvidia-smi
You should see something like this a second later:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.28.03 Driver Version: 560.28.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L4 Off | 00000000:00:03.0 Off | 0 |
| N/A 62C P8 14W / 72W | 1MiB / 23034MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA L4 Off | 00000000:00:04.0 Off | 0 |
| N/A 61C P8 15W / 72W | 1MiB / 23034MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 2 NVIDIA L4 Off | 00000000:00:05.0 Off | 0 |
| N/A 54C P8 13W / 72W | 1MiB / 23034MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 3 NVIDIA L4 Off | 00000000:00:06.0 Off | 0 |
| N/A 59C P8 14W / 72W | 1MiB / 23034MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
Install CLI tool using pip
:
pip install -U "huggingface_hub[cli]"
Install HuggingFace token:
mkdir ~/.bashrc.d \
&& echo 'export HUGGINGFACE_TOKEN=<your_access_token>' >> ~/.bashrc.d/hf \
&& source ~/.bashrc.d/hf
Configure git credential storage:
git config --global credential.helper store
Login with CLI tool:
huggingface-cli login --token $HUGGINGFACE_TOKEN --add-to-git-credential
You should see some similar after login:
Token is valid (permission: fineGrained).
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /home/user/.cache/huggingface/token
Login successful
Create new project directory:
mkdir llama31_playground && cd llama31_playground
Create requirements.txt
, it's in this gist.
nano requirements.txt
Install Python Virtual Environment
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
exit
Create hello.py
, it's in this gist.
nano hello.py
Run a 'hello world' program:
tmux
source env/bin/activate
python hello.py
exit
Install support for large files for git:
sudo dnf install git-lfs
git lfs install
Clone LLaMA-13b model weights:
git clone https://huggingface.co/huggyllama/llama-13b
Create Vicuna-13b weights output directory:
mkdir vicuna-13b
Clone FastChat repository:
git clone https://github.com/lm-sys/FastChat.git && cd FastChat
Upgrade pip
(to enable PEP 660 support):
pip3 install --upgrade pip
Install package dependencies:
pip3 install -e .
Apply delta weights (will download repository):
python3 -m fastchat.model.apply_delta \
--base-model-path ../llama-13b \
--target-model-path ../vicuna-13b \
--delta-path lmsys/vicuna-13b-delta-v1.1
Confirm weights output:
ls -alh ../vicuna-13b/
python3 -m fastchat.serve.cli --model-path ../vicuna-13b
Install tmux
for easy running multiple processes:
sudo dnf install -y tmux
To run tmux just type tmux
in the shell.
The first window is created automatically.
To create another window ctrl + b
then c
.
To switch window ctrl + b
then w
and choose with arrows the window.
To detach ctrl + b
then d
.
To reattach latest session type tmux at
in the shell.
Run each of the servers in a different tmux
window so you can switch between
them and also leave them running in interactive mode after you logout or disconnected.
Start the controller server:
python3 -m fastchat.serve.controller
Start the worker server (can run multiple workers, different models):
python3 -m fastchat.serve.model_worker --model-path ../vicuna-13b/
Add default web interface http port (7860
) to firewall:
sudo firewall-cmd --add-port=7860/tcp
sudo firewall-cmd --add-port=7860/tcp --permanent
If you're using Google GCP you probably need to open ingress for port 7860!
Start the GUI web interface:
python3 -m fastchat.serve.gradio_web_server