Skip to content

Instantly share code, notes, and snippets.

View rwightman's full-sized avatar

Ross Wightman rwightman

View GitHub Profile

comparing ln

Comparing some LayerNorm for 2D rank-4 NCHW tensors via ConvNeXt models on 3090 and V100.

All runs done with native torch AMP, PyTorch 1.12 cu113.

Some col descriptions

  • fmt - PyTorch memory_format
  • cg - full model codgen (one of torchscript, aot, eager (none))
  • layer - the LayerNorm impl
@rwightman
rwightman / BENCHMARK.md
Last active July 12, 2023 09:40
timm model benchmark compare

NCHW and NHWC benchmark numbers for some common image classification models in timm.

For NCHW: python benchmark.py --model-list model.txt --amp -b 128

For NHWC: python benchmark.py --model-list model.txt --amp -b 128 --channels-last

Note the test res for efficientnet_b1/b2/b3/b4 and regnety_160 were adjusted in timm to match original paper and not timm defaults. Benchmark script in root of timm https://github.com/rwightman/pytorch-image-models/blob/master/benchmark.py

@rwightman
rwightman / standalone_bifpn.py
Last active January 27, 2023 06:09
Use effdet BiFPN standalone
from typing import Callable, Union
from dataclasses import dataclass
import timm
import torch.nn as nn
from effdet.efficientdet import BiFpn
from effdet.config import fpn_config
from omegaconf import DictConfig
@rwightman
rwightman / r50.yaml
Last active June 27, 2021 09:42
low aug resnet50 trials for PyTorch XLA test
aa: null
amp: false
aug_splits: 0
batch_size: 256
bn_eps: null
bn_momentum: null
bn_tf: false
channels_last: false
checkpoint_hist: 10
clip_grad: 1.0
@rwightman
rwightman / MLP_hparams.md
Last active June 28, 2021 12:48
MLP model training hparams w/ timm bits and PyTorch XLA on TPU VM

Using TPU VM instance w/ pre-alpha timm bits setup as per: https://github.com/rwightman/pytorch-image-models/tree/bits_and_tpu/timm/bits#readme

python3 launch_xla.py --num-devices 8 train.py gs://my-imagenet --config hparams.yaml

Note the config yaml files do have args that are not used or active based on other overriding code or the state of the current training code. The bits code is under heavy development so these configs will likely need specific revision (currently https://github.com/rwightman/pytorch-image-models/commit/5e95ced5a7763541f7219f35fd155e3fbfe66e8b)

The gMlp hparams are the last (latest) in the series and likely will produce better results than the earlier gmixer / resmlp variants...

Note, for adapting the LR to differenrt batch size. AdamW is being used here and I use a sqrt scaling for the learning rate wrt to (global) batch size. I typicall use linear LR scaling w/ SGD or RMSProp for most from-scratch training.

@rwightman
rwightman / effres-agc.yaml
Last active June 24, 2021 23:51
timm config for training an nfnet, load with --config arg, override batch size, lr for your number of GPUs/dist nodes
aa: rand-m6-n5-inc1-mstd1.0
amp: false
apex_amp: false
aug_splits: 0
batch_size: 256
bn_eps: null
bn_momentum: null
bn_tf: false
channels_last: false
checkpoint_hist: 10
@rwightman
rwightman / timm_unet.py
Created April 15, 2021 19:12
An example U-Net using timm features_only functionality.
""" A simple U-Net w/ timm backbone encoder
Based off an old version of Unet in https://github.com/qubvel/segmentation_models.pytorch
Hacked together by Ross Wightman
"""
from typing import Optional, List
import torch
img_size ngc2102 nocl ngc2102 cl pt-181 nocl pt-181 cl ngc2012 nocl ngc2012 cl ngc2103 nocl ngc2103 cl
128 3323.06 1180.6 3494.51 3561.77 3616.33 3534.56 3585.48 3609.14
132 3114.9 1199.74 3037.34 3519.81 3336.5 3460.3 3357.58 3508.03
136 3272.05 1204.48 2995.19 3574 3227.72 3435.07 3328.46 3424.46
140 3200.35 1207.76 2803.09 3587.26 3185.1 3415.24 3221.43 3471.07
144 3194.24 1220.19 2973.52 3683.47 3205.12 3420.44 3220.51 3454.69
148 2942.87 1218.09 2573.74 2900.56 2895.25 3431.24 2964.88 3508.71
152 2886.33 1191.09 2557.25 3043.76 2854.47 3518.21 2986.86 3500.32
156 2879.16 1190.3 2652.08 2945.3 2807.7 3538.78 2952.99 3497.47
160 2654.9 1213.99 2711.74 2822.56 2748.02 3536.74 2834.89 3504.37
@rwightman
rwightman / bench_by_infer.csv
Created March 6, 2021 06:22
PyTorch Bench (1.8, 1.7.1, NGC 21.02, NGC 20.12)
model gpu env cl infer_samples_per_sec infer_step_time infer_batch_size train_samples_per_sec train_step_time train_batch_size param_count img_size
efficientnet_b0 rtx3090 ngc2102 True 7179.22 0.139 512 1628.51 0.609 256 5.29 224
efficientnet_b0 rtx3090 ngc2012 True 6527.77 0.153 512 1504.58 0.654 256 5.29 224
efficientnet_b0 v100_32 ngc2102 True 6496.56 0.154 512 1556.66 0.638 512 5.29 224
efficientnet_b0 rtx3090 1.7.1cu11.0 True 6020.3 0.166 512 1266.03 0.785 512 5.29 224
efficientnet_b0 rtx3090 1.8cu11.1 True 5979.7 0.167 512 1286.76 0.775 512 5.29 224
efficientnet_b0 v100_32 ngc2012 True 5666.05 0.176 512 1459.05 0.676 512 5.29 224
efficientnet_b0 v100_32 1.8cu11.1 True 5529.09 0.181 512 1444.02 0.688 512 5.29 224
efficientnet_b0 v100_32 1.7.1cu11.0 True 5526.07 0.181 512 1425.38 0.691 512 5.29 224
efficientnet_b0 titanrtx ngc2102 True 5118.38 0.195 512 1156.83 0.862 512 5.29 224
@rwightman
rwightman / image_folder_tar.py
Created July 24, 2019 05:01
PyTorch ImageFolder style dataset for reading directly from tarfile
import torch.utils.data as data
import os
import re
import torch
import tarfile
from PIL import Image
IMG_EXTENSIONS = ['.png', '.jpg', '.jpeg']