Some hparams related to RegNets (and other nets) in TPU training series https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-tpu-weights
All models trained on x8 TPUs, so global batch == batch_size * 8
If in the weight name it says ra3
it means rmsproptf + mixup + cutmix + rand erasing + (usually) lr noise + rand-aug + head dropout + drop path (stochastic depth). Older ra2
scheme was very similar but no cutmix and rand-aug was always using normal sampling (mstd0.5
or mstd1.0
) for rand-aug magnitude, where as ra3
is often (not always) using uniform sampling (mstd101
).
Some weights were trained with sgd + grad clipping (cx
in name where x is one of h, 1, 2, 3 ), h = amped up augreg.
I believe the 064 regnety was very close with both the ra3 and sgd approach, hparams I have kept were the sgd ones but I believe published weights were rmsproptf and edged out by a hair.