wolfecameron’s gists

wolfecameron / demon_example.py

Created October 22, 2022 02:37

	import matplotlib.pyplot as plt

	def calc_demon_decay(total_iter, curr_iter, min_val, max_val):
	z = float(total_iter - curr_iter) / total_iter
	return min_val + float(max_val - min_val) * (z / (1 - 0.9 + 0.9*z))

	train_iters = 100
	max_mom = 0.9
	min_mom = 0.0
	plt.title('Demon Decay')

wolfecameron / ffnn.py

Created January 4, 2023 13:55

	import torch

	class FFNN(torch.nn.Module):
	def __init__(self, input_size, hidden_size, output_size, num_layers):
	super().__init__()
	self.input_size = input_size
	self.hidden_size = hidden_size
	self.output_size = output_size
	self.num_layers = num_layers

wolfecameron / presentation_links.txt

Created January 10, 2023 17:53

	Learning Rate Scheduling:
	- Overview of useful LR schedules: https://cameronrwolfe.substack.com/p/the-best-learning-rate-schedules
	- REX Paper: https://arxiv.org/abs/2107.04197

	Precision Scheduling:
	- Overview of Low Precision Training Techniques: https://cameronrwolfe.substack.com/p/quantized-training-with-deep-networks-82ea7f516dc6
	- CPT Paper: https://arxiv.org/abs/2101.09868

	Video Batch Size Scheduling:
	- Overview of Video Deep Learning (Part One): https://cameronrwolfe.substack.com/p/deep-learning-on-video-part-one-the-early-days-8a3632ed47d4

wolfecameron / resnet_schema.txt

Created February 3, 2023 21:38

	ResNet(
	(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
	(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
	(relu): ReLU(inplace=True)
	(maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
	(layer1): Sequential(
	(0): BasicBlock(
	(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
	(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
	(relu): ReLU(inplace=True)

wolfecameron / llm_preso_links.txt

Last active August 12, 2023 06:41

	Summaries and Overviews:
	- GPT and GPT-2: https://cameronrwolfe.substack.com/p/language-models-gpt-and-gpt-2
	- Scaling Laws and GPT-3: https://cameronrwolfe.substack.com/p/language-model-scaling-laws-and-gpt
	- OPT-175B (Open-Source GPT-3): https://cameronrwolfe.substack.com/p/understanding-the-open-pre-trained-transformers-opt-library-193a29c14a15
	- Modern LLMs: https://cameronrwolfe.substack.com/p/modern-llms-mt-nlg-chinchilla-gopher
	- Specialized LLMs: https://cameronrwolfe.substack.com/p/specialized-llms-chatgpt-lamda-galactica
	- Why does ChatGPT work?: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/
	- Orca: https://cameronrwolfe.substack.com/p/orca-properly-imitating-proprietary
	- LLaMA: https://cameronrwolfe.substack.com/p/llama-llms-for-everyone
	- MPT: https://cameronrwolfe.substack.com/p/democratizing-ai-mosaicmls-impact

wolfecameron / llm_preso_links_2.txt

Created August 10, 2023 13:42

LLM Presentation Links (EY Week #2)

	[LLM Training and Fundamentals]
	- GPT and GPT-2: https://cameronrwolfe.substack.com/p/language-models-gpt-and-gpt-2
	- GPT-3 and LLM Scaling: https://cameronrwolfe.substack.com/p/language-model-scaling-laws-and-gpt
	- Modern LLMs: https://cameronrwolfe.substack.com/p/modern-llms-mt-nlg-chinchilla-gopher
	- Specialized LLMs: https://cameronrwolfe.substack.com/p/specialized-llms-chatgpt-lamda-galactica

	[Open Source LLMs]
	- LLaMA: https://cameronrwolfe.substack.com/p/llama-llms-for-everyone
	- Beyond LLaMA (Imitation Models): https://cameronrwolfe.substack.com/p/beyond-llama-the-power-of-open-llms
	- False Promise of Imitation: https://cameronrwolfe.substack.com/p/imitation-models-and-the-open-source

wolfecameron / cartier_session1_links.txt

Created September 27, 2023 13:58

Links from LLM presentation for Cartier (week 1).

	Summaries and Overviews:
	- History of AI: https://sitn.hms.harvard.edu/flash/2017/history-artificial-intelligence/
	- GPT and GPT-2: https://cameronrwolfe.substack.com/p/language-models-gpt-and-gpt-2
	- Modern LLMs: https://cameronrwolfe.substack.com/p/modern-llms-mt-nlg-chinchilla-gopher
	- Scaling Laws and GPT-3: https://cameronrwolfe.substack.com/p/language-model-scaling-laws-and-gpt
	- The Illustrated Transformer: http://jalammar.github.io/illustrated-transformer/
	- Language Model Mechanics: https://cameronrwolfe.substack.com/i/135273362/the-mechanics-of-a-language-model
	- BERT: https://cameronrwolfe.substack.com/p/language-understanding-with-bert
	- Transformer Architecture (T5): https://cameronrwolfe.substack.com/p/t5-text-to-text-transformers-part
	- Foundation Models: https://crfm.stanford.edu

wolfecameron / cartier_session2_links

Last active October 12, 2023 04:01

Links from LLM presentation for Cartier (week 2).

	Summaries and Overviews:
	- Modern LLMs: https://cameronrwolfe.substack.com/p/modern-llms-mt-nlg-chinchilla-gopher
	- Specialized LLMs: https://cameronrwolfe.substack.com/p/specialized-llms-chatgpt-lamda-galactica
	- Practical Prompt Engineering: https://cameronrwolfe.substack.com/p/practical-prompt-engineering-part
	- Advanced Prompt Engineering: https://cameronrwolfe.substack.com/p/advanced-prompt-engineering
	- LLM Training and Inference: https://cameronrwolfe.substack.com/p/language-model-training-and-inference
	- Understanding SFT: https://cameronrwolfe.substack.com/p/understanding-and-using-supervised
	- RLHF and Alternatives: https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives
	- Data is the foundation of language models: https://cameronrwolfe.substack.com/p/data-is-the-foundation-of-language
	- RLAIF: https://cameronrwolfe.substack.com/p/rlaif-reinforcement-learning-from

wolfecameron / causal_self_attention.py

Last active June 19, 2025 15:56

Implementation of causal self-attention in PyTorch

	"""
	Source: https://github.com/karpathy/nanoGPT/blob/master/model.py
	"""

	import math
	import torch
	from torch import nn
	import torch.nn.functional as F

	class CausalSelfAttention(nn.Module):

wolfecameron / exploding_activations.py

Created March 2, 2024 21:23

Exploding activations from repeated matrix multiplications.

	import torch

	# experiment settings
	d = 5
	nlayers = 100
	normalize = False # set True to use normalization

	# create vector with random entries between [-1, 1]
	input_vector = (torch.rand(d) - 0.5) * 2.0

Cameron R. Wolfe wolfecameron