Skip to content

Instantly share code, notes, and snippets.

View aidando73's full-sized avatar

Aidan Do aidando73

View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@ufuk
ufuk / AsyncConfiguration.java
Last active December 10, 2022 21:43
The easy way to disable @ Async annotation for test contexts. Same approach can be used to disable @ Scheduled annotation as well.
package ...configuration;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.Profile;
import org.springframework.scheduling.annotation.EnableAsync;
@Configuration
@EnableAsync
@Profile("!test")
public class AsyncConfiguration {
@rgl
rgl / wait_for_http_200.sh
Last active February 18, 2025 11:37
Wait for an HTTP endpoint to return 200 OK with Bash and curl
bash -c 'while [[ "$(curl -s -o /dev/null -w ''%{http_code}'' localhost:9000)" != "200" ]]; do sleep 5; done'
# also check https://gist.github.com/rgl/c2ba64b7e2a5a04d1eb65983995dce76
@tasdikrahman
tasdikrahman / .sqliterc
Created November 3, 2015 12:59
sqlite3 rc file for showing the columns and mode in order. Place it in ~/
.header on
.mode column
@staltz
staltz / introrx.md
Last active May 15, 2025 10:37
The introduction to Reactive Programming you've been missing
TY - JOUR
TI - Unions, Norms, and the Rise in U.S. Wage Inequality
AU - Western, Bruce
AU - Rosenfeld, Jake
T2 - American Sociological Review
AB - From 1973 to 2007, private sector union membership in the United States declined from 34 to 8 percent for men and from 16 to 6 percent for women. During this period, inequality in hourly wages increased by over 40 percent. We report a decomposition, relating rising inequality to the union wage distribution’s shrinking weight. We argue that unions helped institutionalize norms of equity, reducing the dispersion of nonunion wages in highly unionized regions and industries. Accounting for unions’ effect on union and nonunion wages suggests that the decline of organized labor explains a fifth to a third of the growth in inequality—an effect comparable to the growing stratification of wages by education.
DA - 2011///
PY - 2011
DO - 10.1177/0003122411414817
DP - Highwire 2.0