Skip to content

Instantly share code, notes, and snippets.

View xmzhao's full-sized avatar

Xuemin Zhao xmzhao

  • Tencent
  • Chengdu, China
View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@harubaru
harubaru / wd1-3-release.md
Last active April 16, 2025 14:17
Official Release Notes for Waifu Diffusion 1.3
@nlohmann
nlohmann / remove_empty_elements.py
Created March 12, 2018 14:19
Remove empty arrays, objects or null elements from a JSON value
def remove_empty_elements(d):
"""recursively remove empty lists, empty dicts, or None elements from a dictionary"""
def empty(x):
return x is None or x == {} or x == []
if not isinstance(d, (dict, list)):
return d
elif isinstance(d, list):
return [v for v in (remove_empty_elements(v) for v in d) if not empty(v)]
@bschlinker
bschlinker / timestampWithMs.cpp
Created February 5, 2018 18:32
Timestamp with milliseconds
#include <chrono>
#include <iomanip>
#include <iostream>
string getTimestamp() {
// get a precise timestamp as a string
const auto now = std::chrono::system_clock::now();
const auto nowAsTimeT = std::chrono::system_clock::to_time_t(now);
const auto nowMs = std::chrono::duration_cast<std::chrono::milliseconds>(
now.time_since_epoch()) % 1000;
@shagunsodhani
shagunsodhani / SkipThoughtVectors.md
Created December 3, 2016 09:36
Notes for Skip-Thought Vectors paper

Skip-Thought Vectors

Introduction

  • The paper describes an unsupervised approach to train a generic, distributed sentence encoder.
  • It also describes a vocabulary expansion method to encode words not seen at training time.
  • Link to the paper

Skip-Thoughts

@fahadysf
fahadysf / README.md
Last active January 26, 2023 20:50
A multiprocess task broker which accepts and provides status reports for tasks using JSON REST API calls.

Multiprocess Task Broker with REST API

This gist shows and example or an asynchronous multiprocess task broker which can take job requests and report on running jobs via a minimal REST API.

Adapted from https://gist.github.com/nitaku/10d0662536f37a087e1b

All of the caveats from the original author still apply.

@wangkuiyi
wangkuiyi / learn-mpi.go
Created December 18, 2015 18:55
Build and run an MPI program in Go
// This is a sample MPI program in Go.
//
// To build and run it, we need to install MPI. I downloaded and built
// Open MPI 1.8.8:
//
// wget https://www.open-mpi.org/software/ompi/v1.8/downloads/openmpi-1.8.8.tar.bz2
// tar xjf openmpi-1.8.8.tar.bz2
// cd openmpi-1.8.8
// ./configure --prefix=/home/yi/openmpi
// make -j2 install
@n8thangreen
n8thangreen / covariate shift.R
Last active May 28, 2018 01:53
empirical and model based (logistic) training sample adjustment
covariateShift <- function(data, resla, riskfac, ssize=10000){
## importance sampling approach
## when different distributions for the
## training and test data
require(plyr)
Natsal.riskfac.table <- DistnTable(data, riskfac)
Natsal.riskfac.table <- colNameReplace(Natsal.riskfac.table, "(all)", "Natsalfreq")
@opensourcegeek
opensourcegeek / connection_pool
Created March 28, 2014 00:13
A quick and easy way to reconnect to mysql when connection is lost, it uses gevent queues but still the idea should work whichever way the connections are pooled.
from gevent import monkey
monkey.patch_socket()
import logging
import gevent
from gevent.queue import Queue
import pymysql as db
logging.basicConfig(level=logging.DEBUG)
LOGGER = logging.getLogger("connection_pool")