Skip to content

Instantly share code, notes, and snippets.

@rwightman
Last active January 21, 2025 20:58
Show Gist options
  • Save rwightman/943c0fe59293b44024bbd2d5d23e6303 to your computer and use it in GitHub Desktop.
Save rwightman/943c0fe59293b44024bbd2d5d23e6303 to your computer and use it in GitHub Desktop.
Recent timm hparams...

A variety of hparams used to train vit, convnext, vit-hybrids (maxvit, coatnet) recently in timm

All variations on the same theme (DeiT / Swin pretraining) but with different tweaks here and there.

These were all run on 4-8 GPU or TPU devices, they use --lr-base which rescales the LR automatically based on global batch size (relative to --lr-base-size) so adapting to different GPU counts will work well within a range, running at significanly lower or higher global batch sizes will require re-running a LR search.

More recntly, DeiT-III has shown to be a very compelling set of hparams for vit like models, but I've yet to do full runs myself, but theirs can be adapted to timm train scripts (3A aug added recently). https://github.com/facebookresearch/deit/blob/main/README_revenge.md

To use the yaml files directly w/ timm train script.

./distributed_train.sh 8 /path/to/dataset --config coat.yaml --model my_model_to_train --params 1 --to 2 --override 3

Where the last 3 are just examples, you can override any param in config with command line additions. All below were trained with TFDS, to use image folder, the dataset: field must be set differently, setting to '' (empty string) in the config file would work, or on cmd line --dataset imagefolder/ would work.

aa: rand-m9-inc1-mstd1.0
amp: false
aug_splits: 0
batch_size: 768
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: false
checkpoint_hist: 10
class_map: ''
clip_grad: 5.0
clip_mode: norm
color_jitter: null
cooldown_epochs: 10
crop_pct: null
cutmix: 1.0
cutmix_minmax: null
data_dir: gs://xxx-imagenet
dataset: tfds/imagenet2012:5.0.0
dataset_download: false
decay_epochs: 100
decay_rate: 0.1
dist_bn: reduce
drop: 0.1
drop_block: null
drop_connect: null
drop_path: 0.2
epoch_repeats: 0.0
epochs: 600
eval_metric: top1
experiment: ''
force_cpu: false
gp: null
grad_checkpointing: false
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd_loss: false
layer_decay: null
local_rank: 0
log_interval: 50
log_wandb: false
lr: null
lr_base: 0.001
lr_base_scale: ''
lr_base_size: 4096
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
min_lr: 5.0e-07
mixup: 0.8
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: coatnet_0
model_ema: false
model_ema_decay: 0.9998
momentum: 0.9
no_aug: false
no_resume_opt: false
num_classes: null
opt: adamw
opt_betas: null
opt_eps: 1.0e-08
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 2
recovery_interval: 0
remode: pixel
reprob: 0.3
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 0
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: null
vflip: 0.0
warmup_epochs: 20
warmup_lr: 5.0e-07
weight_decay: 0.05
workers: 8
aa: rand-m8-inc1-mstd101
amp: false
aug_splits: 0
batch_size: 96
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: false
checkpoint_hist: 10
class_map: ''
clip_grad: 2.0
clip_mode: norm
color_jitter: null
cooldown_epochs: 10
crop_pct: null
cutmix: 1.0
cutmix_minmax: null
data_dir: gs://xxx-imagenet12k
dataset: tfds/imagenet12k
dataset_download: false
decay_epochs: 100
decay_milestones:
- 30
- 60
decay_rate: 0.1
dist_bn: reduce
drop: 0.133
drop_block: null
drop_connect: null
drop_path: 0.35
epoch_repeats: 0.0
epochs: 220
eval_metric: top1
experiment: ''
force_cpu: false
gp: null
grad_checkpointing: false
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd_loss: false
layer_decay: null
local_rank: 0
log_interval: 50
log_wandb: false
lr: null
lr_base: 0.0008
lr_base_scale: ''
lr_base_size: 1024
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
min_lr: 5.0e-07
mixup: 0.8
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: coatnet_2_rw_224
model_ema: false
model_ema_decay: 0.9998
momentum: 0.9
no_aug: false
no_resume_opt: false
num_classes: 11821
opt: adamw
opt_betas: null
opt_eps: 1.0e-08
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 2
recovery_interval: 0
remode: pixel
reprob: 0.3
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 0
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: null
vflip: 0.0
warmup_epochs: 20
warmup_lr: 5.0e-07
weight_decay: 0.05
workers: 8
aa: rand-m8-n3-inc1-mstd101
amp: false
aug_splits: 0
batch_size: 128
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: false
checkpoint_hist: 10
class_map: ''
clip_grad: 2.0
clip_mode: norm
color_jitter: null
cooldown_epochs: 10
crop_pct: null
cutmix: 1.0
cutmix_minmax: null
data_dir: gs://xxx-imagenet
dataset: tfds/imagenet2012:5.0.0
dataset_download: false
decay_epochs: 100
decay_milestones:
- 30
- 60
decay_rate: 0.1
dist_bn: reduce
drop: 0.133
drop_block: null
drop_connect: null
drop_path: 0.35
epoch_repeats: 0.0
epochs: 600
eval_metric: top1
experiment: ''
force_cpu: false
gp: null
grad_checkpointing: false
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd_loss: false
layer_decay: null
local_rank: 0
log_interval: 50
log_wandb: false
lr: null
lr_base: 0.001
lr_base_scale: ''
lr_base_size: 1024
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
min_lr: 5.0e-07
mixup: 0.8
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: coatnet_2_rw_224
model_ema: false
model_ema_decay: 0.9998
momentum: 0.9
no_aug: false
no_resume_opt: false
num_classes: null
opt: adamw
opt_betas: null
opt_eps: 1.0e-08
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 2
recovery_interval: 0
remode: pixel
reprob: 0.3
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 0
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: null
vflip: 0.0
warmup_epochs: 20
warmup_lr: 5.0e-07
weight_decay: 0.05
workers: 8
aa: rand-m8-inc1-mstd101
amp: true
aug_splits: 0
batch_size: 320
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: true
checkpoint_hist: 10
class_map: ''
clip_grad: null
clip_mode: norm
color_jitter: null
cooldown_epochs: 10
crop_pct: null
cutmix: 1.0
cutmix_minmax: null
data_dir: gs://xxx-imagenet
dataset: tfds/imagenet2012:5.0.0
dataset_download: false
decay_epochs: 100
decay_milestones:
- 30
- 60
decay_rate: 0.1
dist_bn: reduce
drop: 0.2
drop_block: null
drop_connect: null
drop_path: 0.1
epoch_repeats: 0.0
epochs: 600
eval_metric: top1
experiment: ''
force_cpu: false
gp: null
grad_checkpointing: false
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd_loss: false
layer_decay: null
local_rank: 0
log_interval: 50
log_wandb: false
lr: 0.001
lr_base: 0.1
lr_base_scale: ''
lr_base_size: 256
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
min_lr: 5.0e-07
mixup: 0.2
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: convnext_nano
model_ema: false
model_ema_decay: 0.9998
momentum: 0.9
no_aug: false
no_resume_opt: false
num_classes: null
opt: fusedadamw
opt_betas: null
opt_eps: null
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.2
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 42
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: null
vflip: 0.0
warmup_epochs: 5
warmup_lr: 5.0e-07
weight_decay: 0.05
workers: 8
aa: rand-m8-inc1-mstd101
amp: false
aug_splits: 0
batch_size: 256
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: false
checkpoint_hist: 10
class_map: ''
clip_grad: 2.0
clip_mode: norm
color_jitter: null
cooldown_epochs: 10
crop_pct: null
cutmix: 1.0
cutmix_minmax: null
data_dir: gs://xxx-imagenet
dataset: tfds/imagenet2012:5.0.0
dataset_download: false
decay_epochs: 100
decay_milestones:
- 30
- 60
decay_rate: 0.1
dist_bn: reduce
drop: 0.133
drop_block: null
drop_connect: null
drop_path: 0.2
epoch_repeats: 0.0
epochs: 600
eval_metric: top1
experiment: ''
force_cpu: false
gp: null
grad_checkpointing: false
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd_loss: false
layer_decay: null
local_rank: 0
log_interval: 50
log_wandb: false
lr: null
lr_base: 0.001
lr_base_scale: ''
lr_base_size: 4096
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
min_lr: 5.0e-07
mixup: 0.8
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: maxvit_nano_rw
model_ema: false
model_ema_decay: 0.9998
momentum: 0.9
no_aug: false
no_resume_opt: false
num_classes: null
opt: adamw
opt_betas: null
opt_eps: 1.0e-08
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 2
recovery_interval: 0
remode: pixel
reprob: 0.3
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 0
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: null
vflip: 0.0
warmup_epochs: 20
warmup_lr: 5.0e-07
weight_decay: 0.05
workers: 8
aa: rand-m10-inc1-mstd101
amp: false
aug_splits: 0
batch_size: 128
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: false
checkpoint_hist: 10
class_map: ''
clip_grad: 1.0
clip_mode: norm
color_jitter: null
cooldown_epochs: 10
crop_pct: null
cutmix: 1.0
cutmix_minmax: null
data_dir: gs://xxx-imagenet
dataset: tfds/imagenet2012:5.0.0
dataset_download: false
decay_epochs: 100
decay_milestones:
- 30
- 60
decay_rate: 0.1
dist_bn: reduce
drop: 0.1
drop_block: null
drop_connect: null
drop_path: 0.25
epoch_repeats: 0.0
epochs: 550
eval_metric: top1
experiment: ''
force_cpu: false
gp: null
grad_checkpointing: false
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd_loss: false
layer_decay: null
local_rank: 0
log_interval: 50
log_wandb: false
lr: null
lr_base: 0.003
lr_base_scale: ''
lr_base_size: 4096
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
min_lr: 1.0e-06
mixup: 0.8
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: maxvit_tiny_rw_256
model_ema: false
model_ema_decay: 0.9998
momentum: 0.9
no_aug: false
no_resume_opt: false
num_classes: null
opt: adamw
opt_betas: null
opt_eps: 1.0e-08
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 2
recovery_interval: 0
remode: pixel
reprob: 0.25
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 0
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: null
vflip: 0.0
warmup_epochs: 20
warmup_lr: 5.0e-07
weight_decay: 0.05
workers: 8
aa: rand-m9-inc1-mstd1.0
amp: false
aug_splits: 0
batch_size: 256
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: false
checkpoint_hist: 10
class_map: ''
clip_grad: 5.0
clip_mode: norm
color_jitter: null
cooldown_epochs: 10
crop_pct: null
cutmix: 1.0
cutmix_minmax: null
data_dir: gs://xxx-imagenet
dataset: tfds/imagenet2012:5.0.0
dataset_download: false
decay_epochs: 100
decay_rate: 0.1
dist_bn: reduce
drop: 0.0
drop_block: null
drop_connect: null
drop_path: 0.2
epoch_repeats: 0.0
epochs: 500
eval_metric: top1
experiment: ''
force_cpu: false
gp: null
grad_checkpointing: false
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd_loss: false
layer_decay: null
local_rank: 0
log_interval: 50
log_wandb: false
lr: null
lr_base: 0.0006
lr_base_scale: ''
lr_base_size: 512
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
min_lr: 5.0e-07
mixup: 0.8
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: vit_base_patch16_rpn_224
model_ema: false
model_ema_decay: 0.9998
momentum: 0.9
no_aug: false
no_resume_opt: false
num_classes: null
opt: adamw
opt_betas: null
opt_eps: 1.0e-08
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.2
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 0
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: null
vflip: 0.0
warmup_epochs: 20
warmup_lr: 5.0e-07
weight_decay: 0.05
workers: 4
aa: rand-m8-inc1-mstd101
amp: true
amp_dtype: float16
amp_impl: native
aot_autograd: false
apex_amp: false
aug_repeats: 0
aug_splits: 0
batch_size: 512
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: false
checkpoint_hist: 10
class_map: ''
clip_grad: 2.0
clip_mode: norm
color_jitter: 0.4
cooldown_epochs: 10
crop_pct: 1.0
cutmix: 1.0
cutmix_minmax: null
data_dir: /data/imagenet/
dataset: ''
dataset_download: false
decay_epochs: 100
decay_milestones:
- 30
- 60
decay_rate: 0.1
dist_bn: reduce
drop: 0.0
drop_block: null
drop_connect: null
drop_path: 0.1
epoch_repeats: 0.0
epochs: 50
eval_metric: top1
experiment: ''
fast_norm: false
fuser: ''
gp: null
grad_checkpointing: true
hflip: 0.5
img_size: null
in_chans: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd_loss: false
layer_decay: 0.7
local_rank: 0
log_interval: 50
log_wandb: false
lr: 0.0002
lr_base: 0.1
lr_base_scale: ''
lr_base_size: 256
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise:
- 0.1
- 0.9
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
min_lr: 5.0e-07
mixup: 0.8
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: vit_base_patch16_224.augreg_in21k
model_ema: true
model_ema_decay: 0.9998
model_ema_force_cpu: false
momentum: 0.9
native_amp: false
no_aug: false
no_ddp_bb: false
no_prefetcher: false
no_resume_opt: false
num_classes: 1000
opt: adamw
opt_betas: null
opt_eps: null
output: ''
patience_epochs: 10
pin_mem: false
pretrained: true
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.3
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
sched_on_updates: true
seed: 42
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: null
vflip: 0.0
warmup_epochs: 10
warmup_lr: 0.0
warmup_prefix: true
weight_decay: 0.05
worker_seeding: all
workers: 8
aa: rand-m9-inc1-mstd101
amp: false
aug_splits: 0
batch_size: 224
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: false
checkpoint_hist: 10
class_map: ''
clip_grad: 5.0
clip_mode: norm
color_jitter: null
cooldown_epochs: 10
crop_pct: null
cutmix: 1.0
cutmix_minmax: null
data_dir: gs://xxx-imagenet12k
dataset: tfds/imagenet12k
dataset_download: false
decay_epochs: 100
decay_rate: 0.1
dist_bn: reduce
drop: 0.0
drop_block: null
drop_connect: null
drop_path: 0.1
epoch_repeats: 0.0
epochs: 250
eval_metric: top1
experiment: ''
force_cpu: false
gp: null
grad_checkpointing: false
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd_loss: false
layer_decay: null
local_rank: 0
log_interval: 50
log_wandb: false
lr: null
lr_base: 0.0005
lr_base_scale: ''
lr_base_size: 512
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
min_lr: 5.0e-07
mixup: 0.2
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: vit_relpos_base_patch16_224
model_ema: false
model_ema_decay: 0.9998
momentum: 0.9
no_aug: false
no_resume_opt: false
num_classes: 11821
opt: adamw
opt_betas: null
opt_eps: 1.0e-08
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.0
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 0
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: null
vflip: 0.0
warmup_epochs: 20
warmup_lr: 5.0e-07
weight_decay: 0.05
workers: 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment