Cameron R. Wolfe wolfecameron

AI Researcher @ Netflix • Writer of Deep (Learning) Focus • PhD in CS from Rice University

wolfecameron / expert_layer.py

Last active March 6, 2025 22:01

Expert layer for a MoE-based transformer.

	"""
	Based upon ColossalAI OpenMoE
	"""

	import torch
	from torch import nn

	class MLPExperts(nn.Module):

	def __init__(

wolfecameron / tokenizer_example.py

Last active April 4, 2025 07:12

	import torch
	from transformers import AutoTokenizer

	# load the llama-3.2 tokenizer
	tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-3.1-8B')

	# raw text
	text = "This raw text will be tokenized"

	# create tokens using tokenizer

wolfecameron / masked_self_attention.py

Last active March 6, 2025 19:15

Basic PyTorch implementation of masked self-attention with a single attention head.

	"""
	Source: https://github.com/karpathy/nanoGPT/blob/master/model.py
	"""

	import math
	import torch
	from torch import nn
	import torch.nn.functional as F

	class MaskedSelfAttention(nn.Module):

wolfecameron / gpt.py

Last active March 8, 2025 20:52

Implementation of a GPT-style decoder only transformer.

	"""
	Source: https://github.com/karpathy/nanoGPT/blob/master/model.py
	"""

	import torch
	from torch import nn
	import torch.nn.functional as F

	class GPT(nn.Module):

wolfecameron / decoder_only_block.py

Last active March 6, 2025 22:11

Implementation of a decoder-only transformer block.

	"""
	Source: https://github.com/karpathy/nanoGPT/blob/master/model.py
	"""

	from torch import nn

	class Block(nn.Module):
	def __init__(
	self,
	d,

wolfecameron / transformer_ffnn.py

Last active March 6, 2025 19:18

Feed-forward layer of a transformer.

	"""
	Source: https://github.com/karpathy/nanoGPT/blob/master/model.py
	"""

	from torch import nn

	class MLP(nn.Module):

	def __init__(
	self,

wolfecameron / exploding_activations.py

Created March 2, 2024 21:23

Exploding activations from repeated matrix multiplications.

	import torch

	# experiment settings
	d = 5
	nlayers = 100
	normalize = False # set True to use normalization

	# create vector with random entries between [-1, 1]
	input_vector = (torch.rand(d) - 0.5) * 2.0

wolfecameron / causal_self_attention.py

Last active June 19, 2025 15:56

Implementation of causal self-attention in PyTorch

	"""
	Source: https://github.com/karpathy/nanoGPT/blob/master/model.py
	"""

	import math
	import torch
	from torch import nn
	import torch.nn.functional as F

	class CausalSelfAttention(nn.Module):

wolfecameron / cartier_session2_links

Last active October 12, 2023 04:01

Links from LLM presentation for Cartier (week 2).

	Summaries and Overviews:
	- Modern LLMs: https://cameronrwolfe.substack.com/p/modern-llms-mt-nlg-chinchilla-gopher
	- Specialized LLMs: https://cameronrwolfe.substack.com/p/specialized-llms-chatgpt-lamda-galactica
	- Practical Prompt Engineering: https://cameronrwolfe.substack.com/p/practical-prompt-engineering-part
	- Advanced Prompt Engineering: https://cameronrwolfe.substack.com/p/advanced-prompt-engineering
	- LLM Training and Inference: https://cameronrwolfe.substack.com/p/language-model-training-and-inference
	- Understanding SFT: https://cameronrwolfe.substack.com/p/understanding-and-using-supervised
	- RLHF and Alternatives: https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives
	- Data is the foundation of language models: https://cameronrwolfe.substack.com/p/data-is-the-foundation-of-language
	- RLAIF: https://cameronrwolfe.substack.com/p/rlaif-reinforcement-learning-from

wolfecameron / cartier_session1_links.txt

Created September 27, 2023 13:58

Links from LLM presentation for Cartier (week 1).

	Summaries and Overviews:
	- History of AI: https://sitn.hms.harvard.edu/flash/2017/history-artificial-intelligence/
	- GPT and GPT-2: https://cameronrwolfe.substack.com/p/language-models-gpt-and-gpt-2
	- Modern LLMs: https://cameronrwolfe.substack.com/p/modern-llms-mt-nlg-chinchilla-gopher
	- Scaling Laws and GPT-3: https://cameronrwolfe.substack.com/p/language-model-scaling-laws-and-gpt
	- The Illustrated Transformer: http://jalammar.github.io/illustrated-transformer/
	- Language Model Mechanics: https://cameronrwolfe.substack.com/i/135273362/the-mechanics-of-a-language-model
	- BERT: https://cameronrwolfe.substack.com/p/language-understanding-with-bert
	- Transformer Architecture (T5): https://cameronrwolfe.substack.com/p/t5-text-to-text-transformers-part
	- Foundation Models: https://crfm.stanford.edu