🚀

Jordan Lazzaro JordanLazzaro

🚀

Enthusiast of interesting things.

JordanLazzaro / grpo_demo.py

Created October 30, 2025 05:50 — forked from willccbb/grpo_demo.py

GRPO Llama-1B

	# train_grpo.py
	#
	# See https://github.com/willccbb/verifiers for ongoing developments
	#
	"""
	citation:

	@misc{brown2025grpodemo,
	title={Granular Format Rewards for Eliciting Mathematical Reasoning Capabilities in Small Language Models},
	author={Brown, William},