🤠

lewtun

🤠

Cowboy post-training @ Hugging Face

1.6k followers · 0 following

View GitHub Profile

Recently created

Least recently created

Recently updated

Least recently updated

dwarkeshsp / nanochat_simple_rl.py

Created October 21, 2025 00:13

	"""
	Simple RL training script for teaching a model to add.
	Demonstrates REINFORCE and GRPO algorithms in a minimal implementation.

	If you want to run this script, put it inside of nanochat/scripts/ and run it with:
	python -m scripts.simple_rl

	First add "matplotlib>=3.9.0" to pyproject.toml and run 'uv sync'

	I wrote a separate script to download the weights for the model:

willccbb / grpo_demo.py

Last active July 21, 2026 14:34

GRPO Llama-1B

	# train_grpo.py
	#
	# See https://github.com/willccbb/verifiers for ongoing developments
	#
	"""
	citation:

	@misc{brown2025grpodemo,
	title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
	author={Brown, William},