Malcolm Greaves malcolmgreaves

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

Native Secure Enclave backed ssh keys on MacOS

It turns out that MacOS Tahoe can generate and use secure-enclave backed SSH keys! This replaces projects like https://github.com/maxgoedjen/secretive

There is a shared library /usr/lib/ssh-keychain.dylib that traditionally has been used to add smartcard support to ssh by implementing PKCS11Provider interface. However since recently it also implements SecurityKeyProivder which supports loading keys directly from the secure enclave! SecurityKeyProvider is what is normally used to talk to FIDO2 devices (e.g. libfido2 can be used to talk to your Yubikey). However you can now use it to talk to your Secure Enclave instead!

	from __future__ import absolute_import, division, print_function

	import argparse
	import glob
	import logging
	import os
	import random

	import numpy as np
	import torch

	alias dockerfile='script.sh'


	script.sh:

	#!/bin/bash
	echo "FROM scratch"
	docker history --no-trunc $@ \| tac \| tr -s ' ' \| cut -d " " -f 5- \| sed 's,^/bin/sh -c #(nop) ,,g' \| sed 's,^/bin/sh -c,RUN,g' \| sed 's, && , \\\n & ,g' \| sed 's,\s[0-9][\.][0-9]\s[kMG]B\s*$,,g' \| head -n -1

	export CONTAINER_URI="gcr.io/deeplearning-platform-release/experimental.theia.1-7"
	export INSTANCE_NAME=...
	export PROJECT_NAME=...
	export IMAGE_PROJECT="deeplearning-platform-release"
	export IMAGE_FAMILY="theia-container-experimental"
	export MACHINE_TYPE=... #"n1-standard-4"
	export ZONE=... #"us-central1-a"
	gcloud compute instances create "${INSTANCE_NAME}" \
	--project="${PROJECT_NAME}" \
	--zone="${ZONE}" \

	export CONTAINER_URI="gcr.io/deeplearning-platform-release/experimental.theia.1-7"
	export INSTANCE_NAME=...
	export PROJECT_NAME=...
	export IMAGE_PROJECT="deeplearning-platform-release"
	export IMAGE_FAMILY="theia-container-experimental"
	export MACHINE_TYPE=... #"n1-standard-4"
	export ZONE=.... #"us-central1-a"
	gcloud notebooks instances create "${INSTANCE_NAME}" \
	--project="${PROJECT_NAME}" \
	--location="${ZONE}" \