Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.
We will focus on text-to-text language models 📝, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.
Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. 📈
RLHF is especially useful in two scenarios 🌟:
- You can’t create a good loss function
- Example: how do you calculate a metric to measure if the model’s output was funny?
- You want to train with production data, but you can’t easily label your production data
ChatGPT appeared like an explosion on all my social media timelines in early December 2022. While I keep up with machine learning as an industry, I wasn't focused so much on this particular corner, and all the screenshots seemed like they came out of nowhere. What was this model? How did the chat prompting work? What was the context of OpenAI doing this work and collecting my prompts for training data?
I decided to do a quick investigation. Here's all the information I've found so far. I'm aggregating and synthesizing it as I go, so it's currently changing pretty frequently.
# to be used in conjunction with the functions defined here: | |
# https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/biggan_generation_with_tf_hub.ipynb | |
# party parrot transformation | |
noise_seed_A = 3 # right facing | |
noise_seed_B = 31 # left facing | |
num_interps = 14 | |
truncation = 0.2 | |
category = 14 |
See the official Differentiable Programming Manifesto instead.
- Operating system: Preferably Linux or MacOS. If you have Windows, things may crash unexpectedly (try installing a virtual machine if you need to)
- RAM: Minimum 8GB
- Disk space: Mininium 8GB
Here is a list of the required programs and libraries necessary for this lab session. (Please install them before coming to our lab session on Tuesday; this will save us a lot of time, plus these are the same libraries you may need for your first assignment).
- Python 3+ (Note: lab and assignment will be done strictly using Python 3)
- Install latest version of Python 3
# References | |
# - https://www.tensorflow.org/versions/r0.10/tutorials/word2vec/index.html | |
# - https://github.com/tensorflow/tensorflow/blob/r0.10/tensorflow/examples/tutorials/word2vec/word2vec_basic.py | |
from __future__ import absolute_import | |
from __future__ import division | |
from __future__ import print_function | |
import collections | |
import math |
<html> | |
<head> | |
<script src="https://www.gstatic.com/firebasejs/3.0.0/firebase.js"></script> | |
<title>ZeroToApp</title> | |
<style> | |
#messages { width: 40em; border: 1px solid grey; min-height: 20em; } | |
#messages img { max-width: 240px; max-height: 160px; display: block; } | |
#header { position: fixed; top: 0; background-color: white; } | |
.push { margin-bottom: 2em; } | |
@keyframes yellow-fade { 0% {background: #f2f2b8;} 100% {background: none;} } |