Skip to content

Instantly share code, notes, and snippets.

@ChristopherA
ChristopherA / brew-bundle-brewfile-tips.md
Last active October 22, 2025 20:29
Brew Bundle Brewfile Tips

Brew Bundle Brewfile Tips

Copyright & License

Unless otherwise noted (either in this file or in a file's copyright section) the contents of this gist are Copyright ©️2020 by Christopher Allen, and are shared under spdx:Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.) open-source license.

Sponsor

If you more tips and advice like these, you can become a monthly patron on my GitHub Sponsor Page for as little as $5 a month; and your contributions will be multipled, as GitHub is matching the first $5,000! This gist is all about Homebrew, so if you like it you can support it by donating to them or becoming one of their Github Sponsors.

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much