This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
clean_airbench.py | |
Variant of airbench94 which removes the following: | |
* Lookahead optimization | |
* Progressive freezing | |
And increases the training duration slightly via the following: | |
* Epochs 9.9 -> 10.0 | |
* Batch size 1024 -> 1000 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
sweep_casted.py | |
The base training here is a variant of clean_airbench which uses the renormalized optimizer, | |
which is optimal for CIFAR-10 training. | |
We also removed dirac initialization, for the purpose of casting experiments. | |
In full precision, attains 93.78 mean accuracy (n=50). | |
This code sweeps over a large number of number formats and widths, training 100 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
BatchNorm-free variant of airbench94 | |
90.6% mean accuracy in ~6 seconds on an H100 | |
Changes relative to airbench94: | |
- removed BatchNorms and added conv biases | |
- reduced batch size 1024 -> 384 | |
- reduced weight decay 0.015 -> 0.001 | |
- reduced lr 11.5 -> 10.0 | |
- increased epochs 9.9 -> 11 | |
""" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Results from 5 seeds: | |
95.28, 95.35, 95.17, 95.26, 95.28 | |
""" | |
############################################# | |
# Setup/Hyperparameters # | |
############################################# | |
import os |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# dawnbench_dpage.py | |
# This script aims for exact equivalence to the final training procedure described in David Page's post | |
# https://myrtle.ai/learn/how-to-train-your-resnet-8-bag-of-tricks/, while also being highly readable and | |
# adhering to typical PyTorch conventions. | |
# | |
# We ran the following test for equivalence. We executed the final (10-epoch) training configuration provided | |
# by David's code release https://github.com/davidcpage/cifar10-fast/blob/master/bag_of_tricks.ipynb a total | |
# of n=400 times, and we executed this script n=300 times. We observed that the original notebook code yielded | |
# a mean accuracy of 94.10%, and our script yielded a mean accuracy of 94.09%. We calculated the statistical | |
# significance of this difference, finding it to be insigificant (p=0.44). |