Skip to content

Instantly share code, notes, and snippets.

View seanjensengrey's full-sized avatar

Sean Jensen-Grey seanjensengrey

View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

arxiv-ml

I have a tuple of 3 numbers that I would like to predict a final value. The first row is the one I would like to complete with the value ? the other two rows are training data. Could you find the missing value or a way to approximate it?

(2607, 2671, 1975) = ?
(2495, 2488, 1879) = 28644
(2269, 2263, 1597) = 26513

ChatGPT4 estimated the following number of papers for 2023

@sts10
sts10 / rust-command-line-utilities.markdown
Last active November 14, 2024 19:54
A curated list of command-line utilities written in Rust

A curated list of command-line utilities written in Rust

Note: I have moved this list to a proper repository. I'll leave this gist up, but it won't be updated. To submit an idea, open a PR on the repo.

Note that I have not tried all of these personally, and cannot and do not vouch for all of the tools listed here. In most cases, the descriptions here are copied directly from their code repos. Some may have been abandoned. Investigate before installing/using.

The ones I use regularly include: bat, dust, fd, fend, hyperfine, miniserve, ripgrep, just, cargo-audit and cargo-wipe.

  • atuin: "Magical shell history"
  • bandwhich: Terminal bandwidth utilization tool
@androidfred
androidfred / progressively_more_detail.md
Last active January 8, 2024 20:23
This "progressively more detail" image loading algorithm can help you do and learn anything

This "progressively more detail" image loading algorithm can help you do and learn anything

Summary, TLDR

When doing or learning pretty much anything, the strategy where you tackle the whole thing all at once, but starting with a rough outline and then incrementally revisiting everything to go into increasingly more detail, is often more effective than doing things perfectly little by little and missing the big picture. (pun intended)

Longer version

Back in the days of dial up internet, which was very limited in speed and capacity compared to modern internet connections, images on a web page could take very long to load. There were different strategies for dealing with that problem, and if you look hard enough (pun intended again), they can reveal some things about how to do and learn stuff.

@jthodge
jthodge / universal-switcher
Created September 6, 2020 22:07
Show macOS app switcher across all monitors
defaults write com.apple.Dock appswitcher-all-displays -bool true
killall Dock
@dabeaz
dabeaz / README.txt
Created October 15, 2019 20:10
PyCon India 2019, Code from Keynote Presentation by @dabeaz
Code from PyCon India 2019 Keynote Talk
David Beazley (https://www.dabeaz.com)
======================================
This code is presented "as is" and represents what was live-coded
during my closing keynote presentation at PyCon India, Chennai,
October 13, 2009. I have made no changes to the files.
Requires: Python 3.6+, numpy, pygame
@alexellis
alexellis / kvm_minikube.md
Last active July 26, 2024 01:47
Run multiple minikube Kubernetes clusters on Ubuntu Linux with KVM

Ramp up your Kubernetes development, CI-tooling or testing workflow by running multiple Kubernetes clusters on Ubuntu Linux with KVM and minikube.

In this tutorial we will combine the popular minikube tool with Linux's Kernel-based Virtual Machine (KVM) support. It is a great way to re-purpose an old machine that you found on eBay or have gathering gust under your desk. An Intel NUC would also make a great host for this tutorial if you want to buy some new hardware. Another popular angle is to use a bare metal host in the cloud and I've provided some details on that below.

We'll set up all the tooling so that you can build one or many single-node Kubernetes clusters and then deploy applications to them such as OpenFaaS using familiar tooling like helm. I'll then show you how to access the Kubernetes clusters from a remote machine such as your laptop.

Pre-reqs

  • This tutorial uses Ubuntu 16.04 as a base installation, but other distributions are supported by KVM. You'll need to find out how to install
Logo Memo 1 AIM-246.pdf A Computer Laboratory For Elementary Schools
Logo Memo 2 AIM-247.pdf Teaching Children Thinking
Logo Memo 3 AIM-248.pdf Twenty Things To Do With A Computer
Logo Memo 4 AIM-249.pdf Teaching Children To Be Mathematicians vs. Teaching About Mathematics
Logo Memo 5 AIM-254.pdf NIM: A Game-Playing Program
Logo Memo 6 AIM-264.pdf Developing A Musical Ear: A Ne
@maximecb
maximecb / music.csv
Created May 19, 2018 17:21
Python program to sequence MIDI music using a CSV spreadsheet
X kick snare
X
X
X
X kick
X
X
X
X kick snare clap
X

How Clojure's documentation can leapfrog other languages

Summary

I made a documentation generator that cashes in on Clojure's dynamism. See the play-cljs docs (a ClojureScript game library) for an example of its output.

The Problem

Like many of you, I've often wondered what my final regret will be on my deathbed. My best guess came to me in a dream recently. I was walking across the charred earth of an apocalyptic future world, maneuvering around the remains of the less fortunate. I was startled to find a young girl, barely holding onto her life. She murmured something to me. I asked her to repeat it, and she said more loudly: "I...wish your Clojure projects didn't have such crappy documentation."