Last active
February 28, 2025 05:32
-
-
Save kalomaze/37c70e022cb1e9428ebb1ee7a4b52275 to your computer and use it in GitHub Desktop.
GRPO Reinforcement Learning - 7b GSM8k on 8xH100 / 8xA100
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# the "verifiers" repository is a clean implementation of templated GRPO reinforcement learning training environments | |
# this is a generic set of "install from scratch" commands complete with a deepspeed z3 config that i have been using when i spin up nodes | |
# it will run on the gsm8k example w/ default batch size & generation size (8), and the 8th GPU is used for vllm generations | |
# qwen 14b full finetuning will run on this configuration too without LoRA or CUDA OOM, at least for the gsm8k task's context sizes + generation lengths | |
# hyperparameters are controlled by `verifiers/utils/config_utils.py`; i have been preferring extreme grad clipping (between 0.001 and 0.01) and low beta (under 0.01) | |
# NOTE FEB 27: examples have moved into `verifiers/examples` not `/examples` | |
cd /root | |
mkdir boom | |
cd boom | |
git clone https://github.com/willccbb/verifiers.git | |
cd verifiers | |
rm -rf .venv | |
uv venv -p python3.11 | |
source .venv/bin/activate | |
uv pip install setuptools psutil | |
uv pip install torch==2.5.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 | |
uv pip install -e . --no-build-isolation | |
uv pip install wandb | |
mkdir -p configs/deepspeed | |
mkdir -p configs/accelerate | |
wget -P configs/deepspeed https://gist.githubusercontent.com/kalomaze/36b71be9f81151e1a6e2c6d8332c84ad/raw/e9f4b9352829929c6ac16492a4d4663a76ea8bc3/stage3.json | |
wget -P configs/accelerate https://gist.githubusercontent.com/kalomaze/2562e68017be328eefdedf930d39e8be/raw/11281bc48254206cc47d01299bbdce3d1c6d2add/deepspeed.yaml | |
accelerate launch --config_file configs/accelerate/deepspeed.yaml --num_processes 7 verifiers/examples/gsm8k_simple.py |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
using runpod and follow the code , uv pip install -e . raised error , i guess it's the verifiers code problem .