Skip to content

Instantly share code, notes, and snippets.

@kalomaze
Last active February 28, 2025 05:32
Show Gist options
  • Save kalomaze/37c70e022cb1e9428ebb1ee7a4b52275 to your computer and use it in GitHub Desktop.
Save kalomaze/37c70e022cb1e9428ebb1ee7a4b52275 to your computer and use it in GitHub Desktop.
GRPO Reinforcement Learning - 7b GSM8k on 8xH100 / 8xA100
# the "verifiers" repository is a clean implementation of templated GRPO reinforcement learning training environments
# this is a generic set of "install from scratch" commands complete with a deepspeed z3 config that i have been using when i spin up nodes
# it will run on the gsm8k example w/ default batch size & generation size (8), and the 8th GPU is used for vllm generations
# qwen 14b full finetuning will run on this configuration too without LoRA or CUDA OOM, at least for the gsm8k task's context sizes + generation lengths
# hyperparameters are controlled by `verifiers/utils/config_utils.py`; i have been preferring extreme grad clipping (between 0.001 and 0.01) and low beta (under 0.01)
# NOTE FEB 27: examples have moved into `verifiers/examples` not `/examples`
cd /root
mkdir boom
cd boom
git clone https://github.com/willccbb/verifiers.git
cd verifiers
rm -rf .venv
uv venv -p python3.11
source .venv/bin/activate
uv pip install setuptools psutil
uv pip install torch==2.5.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
uv pip install -e . --no-build-isolation
uv pip install wandb
mkdir -p configs/deepspeed
mkdir -p configs/accelerate
wget -P configs/deepspeed https://gist.githubusercontent.com/kalomaze/36b71be9f81151e1a6e2c6d8332c84ad/raw/e9f4b9352829929c6ac16492a4d4663a76ea8bc3/stage3.json
wget -P configs/accelerate https://gist.githubusercontent.com/kalomaze/2562e68017be328eefdedf930d39e8be/raw/11281bc48254206cc47d01299bbdce3d1c6d2add/deepspeed.yaml
accelerate launch --config_file configs/accelerate/deepspeed.yaml --num_processes 7 verifiers/examples/gsm8k_simple.py
@deter3
Copy link

deter3 commented Feb 28, 2025

using runpod and follow the code , uv pip install -e . raised error , i guess it's the verifiers code problem .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment