Skip to content

Instantly share code, notes, and snippets.

View colobas's full-sized avatar

Guilherme Pires colobas

View GitHub Profile
@JoaoLages
JoaoLages / RLHF.md
Last active October 21, 2024 06:06
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation

Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.

We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.

Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈

RLHF is especially useful in two scenarios 🌟:

  • You can’t create a good loss function
    • Example: how do you calculate a metric to measure if the model’s output was funny?
  • You want to train with production data, but you can’t easily label your production data
@alimanfoo
alimanfoo / zarr-links.ipynb
Last active February 28, 2024 19:01
How to create links with zarr
@yzh119
yzh119 / st-gumbel.py
Created January 12, 2018 12:25
ST-Gumbel-Softmax-Pytorch
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
def sample_gumbel(shape, eps=1e-20):
U = torch.rand(shape).cuda()
return -Variable(torch.log(-torch.log(U + eps) + eps))
@julz
julz / main.go
Created November 20, 2015 12:39
containersched minicontainer
package main
import (
"fmt"
"os"
"os/exec"
"syscall"
)
func main() {
@meiamsome
meiamsome / hn_search.js
Last active May 4, 2022 13:23 — forked from kristopolous/hn_seach.js
hn job query search
/* Hacker News Search Script
*
* Original Script by Kristopolous:
* https://gist.github.com/kristopolous/19260ae54967c2219da8
*
* Usage:
* First, copy the script into your browser's console whilst on the Hacker News
* jobs page. Then, you can use the query function to filter the results.
*
* For example,
@jboner
jboner / latency.txt
Last active November 16, 2024 21:28
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD