Keller Jordan KellerJordan

KellerJordan / clean_airbench.py

Created September 23, 2024 07:22

	"""
	clean_airbench.py

	Variant of airbench94 which removes the following:
	* Lookahead optimization
	* Progressive freezing

	And increases the training duration slightly via the following:
	* Epochs 9.9 -> 10.0
	* Batch size 1024 -> 1000

KellerJordan / sweep_casted.py

Created September 22, 2024 15:05

	"""
	sweep_casted.py

	The base training here is a variant of clean_airbench which uses the renormalized optimizer,
	which is optimal for CIFAR-10 training.
	We also removed dirac initialization, for the purpose of casting experiments.

	In full precision, attains 93.78 mean accuracy (n=50).

	This code sweeps over a large number of number formats and widths, training 100

KellerJordan / nobn_airbench94.py

Created April 13, 2024 02:10

	"""
	BatchNorm-free variant of airbench94
	90.6% mean accuracy in ~6 seconds on an H100
	Changes relative to airbench94:
	- removed BatchNorms and added conv biases
	- reduced batch size 1024 -> 384
	- reduced weight decay 0.015 -> 0.001
	- reduced lr 11.5 -> 10.0
	- increased epochs 9.9 -> 11
	"""

KellerJordan / cifar10_resnet18_nesterov_150epochs_512batchsize.py

Last active April 18, 2024 18:22

KellerJordan / dawnbench_dpage.py

Created December 20, 2023 08:47

standard PyTorch version of the final training script from David Page's How to Train Your ResNet

	# dawnbench_dpage.py
	# This script aims for exact equivalence to the final training procedure described in David Page's post
	# https://myrtle.ai/learn/how-to-train-your-resnet-8-bag-of-tricks/, while also being highly readable and
	# adhering to typical PyTorch conventions.
	#
	# We ran the following test for equivalence. We executed the final (10-epoch) training configuration provided
	# by David's code release https://github.com/davidcpage/cifar10-fast/blob/master/bag_of_tricks.ipynb a total
	# of n=400 times, and we executed this script n=300 times. We observed that the original notebook code yielded
	# a mean accuracy of 94.10%, and our script yielded a mean accuracy of 94.09%. We calculated the statistical
	# significance of this difference, finding it to be insigificant (p=0.44).