Robin rdoume

Commit jupyter notebooks code to git and keep output locally

Add a filter to git config by running the following command in bash inside the repo:

git config filter.strip-notebook-output.clean 'jupyter nbconvert --ClearOutputPreprocessor.enabled=True --to=notebook --stdin --stdout --log-level=ERROR'

Create a .gitattributes file inside the directory with the notebooks
Add the following to that file:

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

	import uuid
	import json
	import random

	import keras
	import numpy as np
	import tensorflow as tf
	import click

	from sklearn.base import BaseEstimator, ClassifierMixin
	from scipy.special import expit, logit


	class SoftLabelClassifier(BaseEstimator, ClassifierMixin):
	def __init__(self, regressor, eps=0.001):
	self.regressor = regressor
	self.eps = eps

	def fit(self, X, y=None):

	from matplotlib import pyplot
	import random
	import time

	pyplot.style.use("ggplot")
	now = time.time()

	def generate_user(censor=now):
	# Pick some point in time the user was created
	t_created = t = now - random.random() * 1e7

	/*
	* This query will run across an entire organization looking at tables across every project
	* and shows how they will compare on compressed and uncompressed storage.
	*
	* Region Notes:
	* This query will only read from a single region or multi-region at a time. It's
	* currently not possible to read this data from across all
	*
	* By default this reads from the US multi-region, so this might need to be changed if
	* your data lives elsewhere.

	/*
	* This query will look at the past 30 days of job history to analyze it for costs under
	* BigQuery Editions while utilizing the new autoscaling feature that was introduced.
	* It does this for those using both PAYG (Pay As You Go) and commitment models.
	* It will also compare this versus running the query with the on-demand model.
	*
	* Note that this query utilizes some math modeling behaviors that the BigQuery
	* autoscaler uses. Namely these are the up to 10 seconds "slot scale up time,"
	* the minimum of 60 seconds "slot scale down time," and the behavior that the
	* autoscaler scales up and down in factors of 100 slots for each job.