Skip to content

Instantly share code, notes, and snippets.

View mattholl's full-sized avatar
💭
🫖

Matthew Hollings mattholl

💭
🫖
View GitHub Profile
@younesbelkada
younesbelkada / finetune_llama_v2.py
Last active April 7, 2025 18:27
Fine tune Llama v2 models on Guanaco Dataset
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@dypsilon
dypsilon / frontendDevlopmentBookmarks.md
Last active May 5, 2025 13:05
A badass list of frontend development resources I collected over time.
@timoxley
timoxley / client.js
Created March 23, 2013 06:38
Uncaught Error: Unexpected "\u0000" at position 0 in state START
var url = require('url')
var websocket = require('websocket-stream')
var engine = require('voxel-engine')
var duplexEmitter = require('duplex-emitter')
console.log('starting', 1)
var socket = websocket('ws://' + url.parse(window.location.href).host)
var emitter = duplexEmitter(socket)
@chrislaskey
chrislaskey / iptables.rules
Created January 15, 2013 21:37
Example generic iptable rules file for forwarding traffic from port 80 to port 8080, used in this case for configuring Jenkins/Hudson to run on port 80. Initial commit is the default iptable rules' output from `iptables-save > /etc/iptables.up.rules` on Ubuntu Server 12.04. The revised commit are the updated rules to forward traffic from port 80…
## Filter Table
*filter
:INPUT ACCEPT [971:197590]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [95:9682]
-A INPUT -i eth0 -p tcp --dport 80 -j ACCEPT
-A INPUT -i eth0 -p tcp --dport 8080 -j ACCEPT
COMMIT
window.addEventListener('load', function() {
var section = document.querySelector('div.main'),
args = document.querySelector('div.arguments'),
image = args.querySelector('img'),
video = args.querySelector('video'),
log = $('log');
//check support
if (!supportsWebGL()) {
log.innerHTML = '<p class=\'error\'>Your browser doesn\'t seem to support WebGL. More info <a href=\'http://get.webgl.org/\'>here</a>.</p>';
@noboko
noboko / app.js
Created April 24, 2012 17:07
Learning WebGL for Plask Lesson 16
//Learning WebGL for Plask Lesson16
//Learing WebGL : http://learningwebgl.com/blog/?page_id=1217
//Github : https://github.com/gpjt/webgl-lessons (this script use 'earth.jpg' , 'crate.gif' and 'macbook.json')
//
//Plask : http://www.plask.org/
//
//俺俺でMat3とMat4.toInverseMat3()あたりを追記したものを使ってやりました。
//元のplask.jsと区別つけるために同一フォルダから読み込むようにしてます。
//尚、binary版とgithubの最新版で微妙にplask.jsの変更があるみたいなので、
//plask_.js binary版用、plask.js github用としてます。

High level style in javascript.

Opinions are like assholes, every one has got one.

This one is mine.

Punctuation: who cares?

Punctuation is a bikeshed. Put your semicolons, whitespace, and commas where you like them.

@noboko
noboko / app.js
Created April 14, 2012 14:17
Learning WebGL for Plask Lesson 15
//Learning WebGL for Plask Lesson14
//Learing WebGL : http://learningwebgl.com/blog/?page_id=1217
//Github : https://github.com/gpjt/webgl-lessons (this script use 'earth.jpg' and 'earth-specular.gif')
//
//KeyMap
//zoom : qe,
//lighting : t,
//colorMap : y
//useColorMap : u,
//reset : r,
@noboko
noboko / app.js
Created April 14, 2012 11:46
Learning WebGL for Plask lesson 14
//Learning WebGL for Plask Lesson14
//Learing WebGL : http://learningwebgl.com/blog/?page_id=1217
//Github : https://github.com/gpjt/webgl-lessons (this script use 'earth.jpg' , 'arroway.de_metal+structure+06_d100_flat.jpg' and 'Teapot.json')
//zoom : qe,
//lighting : v,
//useFragmentLighting : f
//useTexture : c,
//reset : r,
//specularColor : tgyhuj,
//