Skip to content

Instantly share code, notes, and snippets.

View rwightman's full-sized avatar

Ross Wightman rwightman

View GitHub Profile
import math
import os
from collections import defaultdict
from pathlib import Path
from huggingface_hub import CommitOperationAdd, preupload_lfs_files, create_commit
# fast transfers using a Rust library, `pip install hf-transfer`
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
@rwightman
rwightman / _README_ViT_SBB_Hparams.md
Last active January 9, 2025 19:34
Searching for Better Vit Baselines Hparams
@rwightman
rwightman / _README_MobileNetV4.md
Last active November 18, 2024 02:32
MobileNetV4 hparams

MobileNetV4 Hparams

Included yaml files are timm train script configs for training MobileNetV4 models in timm (see on HF Hub: https://huggingface.co/collections/timm/mobilenetv4-pretrained-weights-6669c22cda4db4244def9637)

Note the # of GPUs, this needs to be taken into consideration for global batch size equivalence, and LR scaling.

Also note, some models have lr set to a non null value, this LR is used directly if set. Otherwise, it falls back to lr_base and the used rate is calculated based on lr_base_size and a sqrt scaling according to the global batch size.

Models with ix in the tag are using an alternative init for the MQA attention model projections, xavier (glorot) uniform instead of the efficientnet/mobilenet defaults. This seemed to improve stability of the hybrid models, allow a larger (closer to 1) beta2 for adam, otherwise beta2 on the adam, or the LR needed to be reduced to avoid instability with the hybrids.

Hparams for tiny test timm models, all pretty similar, vit3 extended the schedule from the default 1600 epochs to 1800. All based on MobileNetv4 template w/ adamw reduced beta1, grinding out the training. Some of the smallest models had a bit less AA magnitude (3), slightly higher capacity ones increased to m5 or m6.