Skip to content

Instantly share code, notes, and snippets.

View rguo12's full-sized avatar
:octocat:

Ruocheng Guo rguo12

:octocat:
View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@neubig
neubig / dispatch_openai_requests.py
Last active February 19, 2024 17:55
A simple script to get results from the OpenAI Asynchronous API
# NOTE:
# You can find an updated, more robust and feature-rich implementation
# in Zeno Build
# - Zeno Build: https://github.com/zeno-ml/zeno-build/
# - Implementation: https://github.com/zeno-ml/zeno-build/blob/main/zeno_build/models/providers/openai_utils.py
import openai
import asyncio
from typing import Any
@kevinzakka
kevinzakka / spatial_softmax.py
Last active November 6, 2024 10:31
Pytorch Spatial Soft Argmax
import torch
import torch.nn as nn
import torch.nn.functional as F
class SpatialSoftArgmax(nn.Module):
"""Spatial softmax as defined in [1].
Concretely, the spatial softmax of each feature
map is used to compute a weighted mean of the pixel
@Windsooon
Windsooon / leetcode_retag.md
Last active November 3, 2025 06:46
Retag most popular Leetcode problems

osjobs

海外兔

website

@andrewjong
andrewjong / pytorch_image_folder_with_file_paths.py
Last active October 31, 2024 11:13
PyTorch Image File Paths With Dataset Dataloader
import torch
from torchvision import datasets
class ImageFolderWithPaths(datasets.ImageFolder):
"""Custom dataset that includes image file paths. Extends
torchvision.datasets.ImageFolder
"""
# override the __getitem__ method. this is the method that dataloader calls
def __getitem__(self, index):
@abridgland
abridgland / gaussian-processes-1.ipynb
Last active August 24, 2025 14:36
A Jupyter notebook to accompany Intro to Gaussian Processes - Part I at http://bridg.land/posts/gaussian-processes-1
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@baraldilorenzo
baraldilorenzo / readme.md
Last active September 13, 2025 12:17
VGG-16 pre-trained model for Keras

##VGG16 model for Keras

This is the Keras model of the 16-layer network used by the VGG team in the ILSVRC-2014 competition.

It has been obtained by directly converting the Caffe model provived by the authors.

Details about the network architecture can be found in the following arXiv paper:

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan, A. Zisserman

@mblondel
mblondel / letor_metrics.py
Last active September 19, 2024 06:13
Learning to rank metrics.
# (C) Mathieu Blondel, November 2013
# License: BSD 3 clause
import numpy as np
def ranking_precision_score(y_true, y_score, k=10):
"""Precision at rank k
Parameters
@coreylynch
coreylynch / bench_rocsgd.py
Created November 26, 2012 22:08 — forked from pprett/bench_rocsgd.py
Benchmark sklearn RankSVM implementations (now with sofia binding benchmarks)
import itertools
import numpy as np
from sklearn.linear_model import SGDClassifier, SGDRanking
from sklearn import metrics
from minirank.compat import RankSVM as MinirankSVM
from scipy import stats