Skip to content

Instantly share code, notes, and snippets.

@rwightman
Last active July 11, 2023 08:36
Show Gist options
  • Save rwightman/37252f8d7d850a94e43f1fcb7b3b8322 to your computer and use it in GitHub Desktop.
Save rwightman/37252f8d7d850a94e43f1fcb7b3b8322 to your computer and use it in GitHub Desktop.

Some hparams related to RegNets (and other nets) in TPU training series https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-tpu-weights

All models trained on x8 TPUs, so global batch == batch_size * 8

If in the weight name it says ra3 it means rmsproptf + mixup + cutmix + rand erasing + (usually) lr noise + rand-aug + head dropout + drop path (stochastic depth). Older ra2 scheme was very similar but no cutmix and rand-aug was always using normal sampling (mstd0.5 or mstd1.0) for rand-aug magnitude, where as ra3 is often (not always) using uniform sampling (mstd101).

Some weights were trained with sgd + grad clipping (cx in name where x is one of h, 1, 2, 3 ), h = amped up augreg.

I believe the 064 regnety was very close with both the ra3 and sgd approach, hparams I have kept were the sgd ones but I believe published weights were rmsproptf and edged out by a hair.

aa: rand-m9-mstd1.0-inc1
amp: false
aug_splits: 0
batch_size: 128
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
bn_tf: false
channels_last: false
checkpoint_hist: 10
class_map: ''
clip_grad: null
clip_mode: norm
color_jitter: 0.4
cooldown_epochs: 10
crop_pct: null
cutmix: 1.0
cutmix_minmax: null
data_dir: gs://xxx-imagenet
dataset: tfds/imagenet2012:5.0.0
dataset_download: false
decay_epochs: 1.0
decay_rate: 0.988
dist_bn: reduce
drop: 0.3
drop_block: null
drop_connect: null
drop_path: 0.1
epoch_repeats: 0.0
epochs: 600
eval_metric: top1
experiment: ''
force_cpu: false
gp: null
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd: false
jsd_loss: false
local_rank: 0
log_interval: 50
log_wandb: false
lr: 0.1
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise:
- 0.45
- 1.0
lr_noise_pct: 0.45
lr_noise_std: 1.0
mean: null
min_lr: 5.0e-06
mixup: 0.2
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: xxx
model_ema: true
model_ema_decay: 0.99995
model_ema_force_cpu: false
momentum: 0.9
no_aug: false
no_resume_opt: false
num_classes: null
opt: rmsproptf
opt_betas: null
opt_eps: 0.001
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.35
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: step
seed: 21
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: 125
validation_batch_size_multiplier: 1
vflip: 0.0
warmup_epochs: 10
warmup_lr: 1.0e-06
weight_decay: 7.0e-06
workers: 4
aa: rand-m10-n3-mstd101-inc1
amp: true
apex_amp: false
aug_splits: 0
batch_size: 128
bn_eps: null
bn_momentum: null
bn_tf: false
channels_last: false
checkpoint_hist: 10
clip_grad: null
clip_mode: norm
color_jitter: 0.4
cooldown_epochs: 10
crop_pct: null
cutmix: 1.0
cutmix_minmax: null
data_dir: gs://xxx-imagenet
dataset: tfds/imagenet2012:5.0.0
decay_epochs: 1.0
decay_rate: 0.988
dist_bn: reduce
drop: 0.2
drop_block: null
drop_connect: null
drop_path: 0.1
epoch_repeats: 0.0
epochs: 600
eval_metric: top1
experiment: ''
gp: null
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd: false
local_rank: 0
log_interval: 50
log_wandb: false
lr: 0.064
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_noise:
- 0.45
- 1.0
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
min_lr: 1.0e-05
mixup: 0.2
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: xxx
model_ema: true
model_ema_decay: 0.99997
model_ema_force_cpu: false
momentum: 0.9
native_amp: false
no_aug: false
no_prefetcher: false
no_resume_opt: false
num_classes: null
opt: rmsproptf
opt_betas: null
opt_eps: 0.001
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.67
- 1.5
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.35
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: step
seed: 0
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size_multiplier: 1
vflip: 0.0
warmup_epochs: 5
warmup_lr: 1.0e-06
weight_decay: 7.0e-06
workers: 6
aa: rand-m10-mstd101-n3-inc1
amp: false
aug_splits: 0
batch_size: 160
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
bn_tf: false
channels_last: false
checkpoint_hist: 10
class_map: ''
clip_grad: 1.0
clip_mode: norm
color_jitter: 0.4
cooldown_epochs: 10
crop_pct: 0.95
cutmix: 1.0
cutmix_minmax: null
data_dir: gs://xxx-imagenet
dataset: tfds/imagenet2012:5.0.0
dataset_download: false
decay_epochs: 1.0
decay_rate: 0.988
dist_bn: reduce
drop: 0.27
drop_block: null
drop_connect: null
drop_path: 0.1
epoch_repeats: 0.0
epochs: 650
eval_metric: top1
experiment: ''
force_cpu: false
gp: null
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd: false
jsd_loss: false
local_rank: 0
log_interval: 50
log_wandb: false
lr: 0.525
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.45
lr_noise_std: 1.0
mean: null
min_lr: 5.0e-06
mixup: 0.2
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: regnety_064
model_ema: false
model_ema_decay: 0.99995
model_ema_force_cpu: false
momentum: 0.9
no_aug: false
no_resume_opt: false
num_classes: null
opt: sgd
opt_betas: null
opt_eps: 1.0e-06
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.35
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 21
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: 125
validation_batch_size_multiplier: 1
vflip: 0.0
warmup_epochs: 10
warmup_lr: 1.0e-06
weight_decay: 2.0e-05
workers: 4
aa: rand-m9-mstd1.0-inc1
amp: false
aug_splits: 0
batch_size: 192
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
bn_tf: false
channels_last: false
checkpoint_hist: 10
class_map: ''
clip_grad: null
clip_mode: norm
color_jitter: 0.4
cooldown_epochs: 10
crop_pct: 0.95
cutmix: 1.0
cutmix_minmax: null
data_dir: gs://xxx-imagenet
dataset: tfds/imagenet2012:5.0.0
dataset_download: false
decay_epochs: 1.0
decay_rate: 0.988
dist_bn: reduce
drop: 0.25
drop_block: null
drop_connect: null
drop_path: 0.1
epoch_repeats: 0.0
epochs: 600
eval_metric: top1
experiment: ''
force_cpu: false
gp: null
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd: false
jsd_loss: false
local_rank: 0
log_interval: 50
log_wandb: false
lr: 0.15
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise:
- 0.45
- 1.0
lr_noise_pct: 0.45
lr_noise_std: 1.0
mean: null
min_lr: 5.0e-06
mixup: 0.2
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: regnetz_d8
model_ema: true
model_ema_decay: 0.99995
model_ema_force_cpu: false
momentum: 0.9
no_aug: false
no_resume_opt: false
num_classes: null
opt: rmsproptf
opt_betas: null
opt_eps: 0.001
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.35
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: step
seed: 21
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: 125
validation_batch_size_multiplier: 1
vflip: 0.0
warmup_epochs: 10
warmup_lr: 1.0e-06
weight_decay: 7.0e-06
workers: 4
aa: rand-m10-mstd101-n3-inc1
amp: false
aug_splits: 0
batch_size: 64
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
bn_tf: false
channels_last: false
checkpoint_hist: 10
class_map: ''
clip_grad: null
clip_mode: norm
color_jitter: 0.4
cooldown_epochs: 10
crop_pct: 0.95
cutmix: 1.0
cutmix_minmax: null
data_dir: gs://xxx-imagenet
dataset: tfds/imagenet2012:5.0.0
dataset_download: false
decay_epochs: 1.0
decay_rate: 0.988
dist_bn: reduce
drop: 0.25
drop_block: null
drop_connect: null
drop_path: 0.1
epoch_repeats: 0.0
epochs: 650
eval_metric: top1
experiment: ''
force_cpu: false
gp: null
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd: false
jsd_loss: false
local_rank: 0
log_interval: 50
log_wandb: false
lr: 0.25
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.45
lr_noise_std: 1.0
mean: null
min_lr: 5.0e-06
mixup: 0.2
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: regnetz_d8_evos
model_ema: false
model_ema_decay: 0.99995
model_ema_force_cpu: false
momentum: 0.9
no_aug: false
no_resume_opt: false
num_classes: null
opt: sgd
opt_betas: null
opt_eps: 1.0e-06
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.35
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 21
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: 125
validation_batch_size_multiplier: 1
vflip: 0.0
warmup_epochs: 10
warmup_lr: 1.0e-06
weight_decay: 2.0e-05
workers: 4
aa: rand-m10-mstd101-n3-inc1
amp: false
aug_splits: 0
batch_size: 128
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
bn_tf: false
channels_last: false
checkpoint_hist: 10
class_map: ''
clip_grad: 1.0
clip_mode: norm
color_jitter: 0.4
cooldown_epochs: 10
crop_pct: 0.95
cutmix: 1.0
cutmix_minmax: null
data_dir: /imagenet
dataset: ''
dataset_download: false
decay_epochs: 1.0
decay_rate: 0.988
dist_bn: reduce
drop: 0.25
drop_block: null
drop_connect: null
drop_path: 0.1
epoch_repeats: 0.0
epochs: 650
eval_metric: top1
experiment: ''
force_cpu: false
gp: null
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd: false
jsd_loss: false
local_rank: 0
log_interval: 50
log_wandb: false
lr: 0.4
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.45
lr_noise_std: 1.0
mean: null
min_lr: 5.0e-06
mixup: 0.2
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: regnetz_d8
model_ema: false
model_ema_decay: 0.99995
model_ema_force_cpu: false
momentum: 0.9
no_aug: false
no_resume_opt: false
num_classes: null
opt: sgd
opt_betas: null
opt_eps: 1.0e-06
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: cosine
seed: 21
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: 125
validation_batch_size_multiplier: 1
vflip: 0.0
warmup_epochs: 10
warmup_lr: 1.0e-05
weight_decay: 2.0e-05
workers: 4
aa: rand-m8-n3-mstd1.0-inc1
amp: false
aug_splits: 0
batch_size: 112
bce_loss: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
bn_tf: false
channels_last: false
checkpoint_hist: 10
clip_grad: null
clip_mode: norm
color_jitter: 0.4
cooldown_epochs: 10
crop_pct: 0.95
cutmix: 1.0
cutmix_minmax: null
data_dir: gs://xxx-imagenet
dataset: tfds/imagenet2012:5.0.0
decay_epochs: 1.0
decay_rate: 0.988
dist_bn: reduce
drop: 0.5
drop_block: null
drop_connect: null
drop_path: 0.12
epoch_repeats: 0.0
epochs: 600
eval_metric: top1
experiment: ''
force_cpu: false
gp: null
hflip: 0.5
img_size: null
initial_checkpoint: ''
input_size: null
interpolation: ''
jsd: false
jsd_loss: false
local_rank: 0
log_interval: 50
log_wandb: false
lr: 0.1
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise:
- 0.45
- 1.0
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
min_lr: 5.0e-06
mixup: 0.2
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: regnetz_e8
model_ema: true
model_ema_decay: 0.99995
model_ema_force_cpu: false
momentum: 0.9
no_aug: false
no_resume_opt: false
num_classes: null
opt: rmsproptf
opt_betas: null
opt_eps: 0.001
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
ratio:
- 0.75
- 1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.0
resplit: false
resume: ''
save_images: false
scale:
- 0.08
- 1.0
sched: step
seed: 0
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
torchscript: false
train_interpolation: random
train_split: train
tta: 0
use_multi_epochs_loader: false
val_split: validation
validation_batch_size: 125
validation_batch_size_multiplier: 1
vflip: 0.0
warmup_epochs: 10
warmup_lr: 1.0e-06
weight_decay: 7.0e-06
workers: 4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment