Included yaml files are timm train script configs for training MobileNetV4 models in timm (see on HF Hub: https://huggingface.co/collections/timm/mobilenetv4-pretrained-weights-6669c22cda4db4244def9637)
Note the # of GPUs, this needs to be taken into consideration for global batch size equivalence, and LR scaling.
Also note, some models have lr
set to a non null value, this LR is used directly if set. Otherwise, it falls back to lr_base
and the used rate is calculated based on lr_base_size
and a sqrt scaling according to the global batch size.
Models with ix
in the tag are using an alternative init for the MQA attention model projections, xavier (glorot) uniform instead of the efficientnet/mobilenet defaults. This seemed to improve stability of the hybrid models, allow a larger (closer to 1) beta2 for adam, otherwise beta2 on the adam, or the LR needed to be reduced to avoid instability with the hybrids.
To easily use the .yaml file, use the --config argument for the timm train.py script. eg: train.py --config mnv4.yaml --data-dir /where/my/data ... <other arg overrides>
Thank you for sharing.