Oliver Mannion tekumara

Learning LLMs in 2025

So you know how the transformer works, and you know basic ML/DL, and you want to learn more about LLMs. One way to go is looking into the various "algorithmic" stuff (optimization algorithms, RL, DPO, etc). Lot's of materials on that. But the interesting stuff is (in my opinion at least) not there.

This is an attempt to collect a list of academic (or academic-like) materials that explore LLMs from other directions, and focus on the non-ML-algorithmic aspects.

Courses

David Chiang's Theory of Neural Networks course.
This is not primarily LLMs, but does have substantial section on Transformers. Formal/Theory. More of a book than a course.

Generating Synthetic Data for LLM Evaluation

Summary

Use your application extensively to build intuition about failure modes
Define 3-4 dimensions based on observed or anticipated failures
Create structured tuples covering your priority failure scenarios
Generate natural language queries from each tuple using a separate LLM call
Scale to more examples across your most important failure hypotheses (we suggest at least ~100)
Test and iterate on the most critical failure modes first, and generate more until you reach theoretical saturation

	# system prompt (always applied)

	<who_you_are>
	You are a superintelligent autonomous AGENT.
	You are assisting a USER in the context of a CONVERSATION represented as a chronological series of EVENTS.
	You have been trained on a vast amount of data from the entire history of human activity on the internet up to this date. You have a deep capacity to find answers to many subjects inside your training data.
	Your training data knowledge cutoff date is 2024-01-01.
	You are a relentless truth-seeker. If you are not sure about file content or factual information pertaining to the USER's request (for example, if it requires information PAST your training data knowledge cutoff date, or the information is not available in the EVENTs of the CONVERSATION), you MUST use your tools to gather the relevant information: do NOT guess or make up an answer.
	</who_you_are>

	# make sure sunshine is using "wlr" capture mode to see headless displays
	# create a new "Headless" app in sunshine for virtual display and set:

	# do commands
	sh -c "hyprctl keyword monitor HEADLESS-2,${SUNSHINE_CLIENT_WIDTH}x${SUNSHINE_CLIENT_HEIGHT}@${SUNSHINE_CLIENT_FPS},auto,1"
	hyprctl keyword monitor eDP-1,disable

	# matching undo commands
	hyprctl keyword monitor HEADLESS-2,disable
	hyprctl reload

	# Project Policy

	This policy provides a single, authoritative, and machine-readable source of truth for AI coding agents and humans, ensuring that all work is governed by clear, unambiguous rules and workflows. It aims to eliminate ambiguity, reduce supervision needs, and facilitate automation while maintaining accountability and compliance with best practices.

	# 1. Introduction

	> Rationale: Sets the context, actors, and compliance requirements for the policy, ensuring all participants understand their roles and responsibilities.

	## 1.1 Actors

	from datasets import load_dataset
	from sentence_transformers import (
	SentenceTransformerTrainer,
	SentenceTransformerTrainingArguments,
	)

	from pylate import losses, models, utils

	def main():
	# As ReasonIR do not re-upload the BRIGHT data, we need to load it from the original source

	You are Manus, an AI agent created by the Manus team.

	You excel at the following tasks:
	1. Information gathering, fact-checking, and documentation
	2. Data processing, analysis, and visualization
	3. Writing multi-chapter articles and in-depth research reports
	4. Creating websites, applications, and tools
	5. Using programming to solve various problems beyond development
	6. Various tasks that can be accomplished using computers and the internet

	// Claude Code is a Beta product per Anthropic's Commercial Terms of Service.
	// By using Claude Code, you agree that all code acceptance or rejection decisions you make,
	// and the associated conversations in context, constitute Feedback under Anthropic's Commercial Terms,
	// and may be used to improve Anthropic's products, including training models.
	// You are responsible for reviewing any code suggestions before use.

	// (c) Anthropic PBC. All rights reserved. Use is subject to Anthropic's Commercial Terms of Service (https://www.anthropic.com/legal/commercial-terms).

	// Version: 0.2.9

	# the "verifiers" repository is a clean implementation of templated GRPO reinforcement learning training environments
	# this is a generic set of "install from scratch" commands complete with a deepspeed z3 config that i have been using when i spin up nodes
	# it will run on the gsm8k example w/ default batch size & generation size (8), and the 8th GPU is used for vllm generations
	# qwen 14b full finetuning will run on this configuration too without LoRA or CUDA OOM, at least for the gsm8k task's context sizes + generation lengths
	# hyperparameters are controlled by `verifiers/utils/config_utils.py`; i have been preferring extreme grad clipping (between 0.001 and 0.01) and low beta (under 0.01)

	# NOTE FEB 27: examples have moved into `verifiers/examples` not `/examples`

	cd /root
	mkdir boom

	#!/usr/bin/env python

	# jj-github-pr: Create Github PRs from jujutsu changes
	# Each change will be one PR. Relations between changes will be preserved by setting the PR base branch.
	# The commit title/message will be used for the PR title/body.
	# The command can be used multiple times to update the stack of PRs, titles and commits.
	# A "Relation chain" which shows the relation between the submitted PRs will be added to the PR body.
	# Usage: Run jj-github-pr <revision> to submit each jj change as a PR.
	#
	# Example usage: