Junyan Xu alreadydone

Are multi-LLM-agent systems a thing? Yes they are. But.

Yoav Goldberg, Nov 24, 2024

This piece started with a pair of twitter and bluesky posts:

let's talk about "agents" (in the LLM sense). there's a lot of buzz around "multi-agent" systems where agents collaborate but... i don't really get how it differs from a thinking of a single agent with multiple modes of operation. what are the benefits of modeling as multi-agent?
— (((ل()(ل() 'yoav))))👾 (@yoavgo) November 23, 2024

FAQ on the xz-utils backdoor (CVE-2024-3094)

This is a living document. Everything in this document is made in good faith of being accurate, but like I just said; we don't yet know everything about what's going on.

Update: I've disabled comments as of 2025-01-26 to avoid everyone having notifications for something a year on if someone wants to suggest a correction. Folks are free to email to suggest corrections still, of course.

Background

Reinforcement Learning for Language Models

Yoav Goldberg, April 2023.

Why RL?

With the release of the ChatGPT model and followup large language models (LLMs), there was a lot of discussion of the importance of "RLHF training", that is, "reinforcement learning from human feedback". I was puzzled for a while as to why RL (Reinforcement Learning) is better than learning from demonstrations (a.k.a supervised learning) for training language models. Shouldn't learning from demonstrations (or, in language model terminology "instruction fine tuning", learning to immitate human written answers) be sufficient? I came up with a theoretical argument that was somewhat convincing. But I came to realize there is an additional argumment which not only supports the case of RL training, but also requires it, in particular for models like ChatGPT. This additional argument is spelled out in (the first half of) a talk by John Schulman from OpenAI. This post pretty much

	import argparse
	import random
	import sys

	from transformers import AutoModelForCausalLM, AutoTokenizer, DynamicCache
	import torch

	parser = argparse.ArgumentParser()
	parser.add_argument("question", type=str)
	parser.add_argument(

	import Mathlib.RingTheory.RingHom.FinitePresentation
	import Mathlib.RingTheory.Flat.Algebra
	import Mathlib.AlgebraicGeometry.Morphisms.FinitePresentation

	/-!
	Authors: Judith Ludwig, Christian Merten

	Note: This only works on the mathlib branch chrisflav/finpres.2
	-/

	#include <stdio.h>
	#include <stdlib.h>
	#include <math.h>
	#include <assert.h>

	#ifndef M_PI
	#define M_PI 3.14159265358979323846
	#endif

	// Constants for model dimensions, learning rate, etc.

	import collections
	from fractions import Fraction
	import os
	import re
	from typing import Dict, List, Set, Tuple, Union

	from datasets import load_dataset
	import pandas as pd

	import Mathlib.Tactic

	inductive E
	\| lit : Bool → E
	\| var : Nat → E
	\| ite : E → E → E → E
	deriving DecidableEq

	def E.hasNestedIf : E → Bool
	\| lit _ => false

	import order.zorn
	import category_theory.category.preorder
	import category_theory.limits.cones

	universes u v w

	namespace category_theory

	open limits