Christopher Akiki cakiki

Learning LLMs in 2025

So you know how the transformer works, and you know basic ML/DL, and you want to learn more about LLMs. One way to go is looking into the various "algorithmic" stuff (optimization algorithms, RL, DPO, etc). Lot's of materials on that. But the interesting stuff is (in my opinion at least) not there.

This is an attempt to collect a list of academic (or academic-like) materials that explore LLMs from other directions, and focus on the non-ML-algorithmic aspects.

Courses

David Chiang's Theory of Neural Networks course.
This is not primarily LLMs, but does have substantial section on Transformers. Formal/Theory. More of a book than a course.

Thank you to tiny corp for pointing out some problems running BERT training with Tinygrad on AMD GPUs in this Tweet. We had a few engineers at AMD take a look at the problem and they were quickly able to reproduce it.

What they found was an issue related to CWSR (compute wave save restore), which is a mechanism that allows our driver and firmware to preempt and reschedule long-running compute waves on our GPUs. The GFXv11 GPU line requires a workaround to set COMPUTE_PGM_RSRC1.PRIV=1 when dispatching a compute kernel. Normally this is handled by the AQL DISPATCH packet. However, since the Tinygrad implementation leverages a custom runtime, it requires this workaround in its PM4-based dispatch. This patch is specific to GFXv11 GPUs. Other GPUs do not require it and should not use this workaround. The following KFDTest patch can be used as a reference: https://github.com/ROCm/ROCT-Thunk-Interface/commit/507637ed5b82197eecbf483cdc1234939766549a

While inv

TPU VM Cheetsheat

This TPU VM cheatsheet uses and was tested with the following library versions:

Library	Version
JAX	0.3.25
FLAX	0.6.4
Datasets	2.10.1
Transformers	4.27.1

	"""
	The most atomic way to train and run inference for a GPT in pure, dependency-free Python.
	This file is the complete algorithm.
	Everything else is just efficiency.

	@karpathy
	"""

	import os # os.path.exists
	import math # math.log, math.exp

	import os
	import sys
	import time
	import math
	import pickle
	from contextlib import nullcontext
	from pathlib import Path
	import subprocess
	from dataclasses import dataclass
	import inspect

	from huggingface_hub.hf_api import ( # type: ignore
	REPO_TYPES,
	REPO_TYPES_URL_PREFIXES,
	HfApi,
	_raise_for_status,
	)

	def update_repo_settings(
	hf_api: HfApi,
	repo_id: str,

	#
	# Author: Cody Buntain
	# Date: 19 March 2020
	#
	# Description:
	# This code is an example of uysing the agreement package
	#. in NLTK to calculate a number of agreement metrics on
	#. a set of annotations. Currently, this code will work
	#. with two annotators and multiple labels.
	#. You can use Fleiss's Kappa or Krippendorf's Alpha if you