Skip to content

Instantly share code, notes, and snippets.

View veekaybee's full-sized avatar
💫
in the latent space

Vicki Boykis veekaybee

💫
in the latent space
View GitHub Profile

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

@anna-geller
anna-geller / test.yml
Created June 12, 2023 14:44
For Vicki
id: test
namespace: dev
tasks:
- id: deploy
type: io.kestra.core.tasks.flows.Worker
tasks:
- id: cloneRepository
type: io.kestra.plugin.git.Clone
url: https://github.com/veekaybee/viberary
branch: main
@Tostino
Tostino / inkbot-chunked-summary.txt
Last active January 8, 2025 09:43
Generate a chunked summary prompt example
<#meta#>
- Date: 2023-10-05
- Task: summary
<#system#>
Your main objective is to condense the content of the document into a concise summary, capturing the main points and themes.
<#chat#>
<#user#>
Please read the provided Original section to understand the context and content. Use this understanding to generate a summary of the Original section, incorporating relevant details and maintaining coherence with the Prior Summary.
Notes:
@Maximilian-Winter
Maximilian-Winter / gbnf_grammar_generator.py
Last active November 17, 2024 06:13
GBNF grammar generator for always valid function calls and object creation in JSON with llama.cpp
import inspect
import json
import re
import typing
from inspect import isclass, getdoc
from types import NoneType
from pydantic import BaseModel, Field
from pydantic.fields import FieldInfo
from typing import Any, Type, List, get_args, get_origin, Tuple, Union, Optional
@kalomaze
kalomaze / llm_samplers_explained.md
Last active September 26, 2025 17:02
LLM Samplers Explained

LLM Samplers Explained

Everytime a large language model makes predictions, all of the thousands of tokens in the vocabulary are assigned some degree of probability, from almost 0%, to almost 100%. There are different ways you can decide to choose from those predictions. This process is known as "sampling", and there are various strategies you can use which I will cover here.

OpenAI Samplers

Temperature

  • Temperature is a way to control the overall confidence of the model's scores (the logits). What this means is that, if you use a lower value than 1.0, the relative distance between the tokens will become larger (more deterministic), and if you use a larger value than 1.0, the relative distance between the tokens becomes smaller (less deterministic).
  • 1.0 Temperature is the original distribution that the model was trained to optimize for, since the scores remain the same.
  • Graph demonstration with voiceover: https://files.catbox.moe/6ht56x.mp4
@vatsalsaglani
vatsalsaglani / mistral_ctx.py
Last active June 3, 2025 20:06
Token counting and message token management for MistralAI
from typing import List, Dict, Literal, Union
from transformers import AutoTokenizer
class MistralAICtx:
def __init__(self, model_name: str):
assert "mistral" in model_name, "MistralCtx only available for Mistral models"
self.tokenizer = AutoTokenizer.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.2")
@fritzprix
fritzprix / llm-commit.sh
Last active August 2, 2025 08:03
🤖 Generate git commit messages automatically using local LLM (Ollama). Simple bash script that analyzes your git diff and creates meaningful commit messages. No API keys, no cloud - runs locally with Ollama.
#!/bin/bash
# Get the git diff and save it to a temporary file
git diff --cached > /tmp/git_diff.txt
# If there's no diff, exit
if [ ! -s /tmp/git_diff.txt ]; then
echo "No staged changes to commit"
exit 1
fi