Comparing some LayerNorm for 2D rank-4 NCHW tensors via ConvNeXt models on 3090 and V100.
All runs done with native torch AMP, PyTorch 1.12 cu113.
Some col descriptions
- fmt - PyTorch memory_format
- cg - full model codgen (one of torchscript, aot, eager (none))
- layer - the LayerNorm impl
- permute -
F.layer_norm(x.permute(0, 2, 3, 1), self.normalized_shape, self.weight, self.bias, self.eps).permute(0, 3, 1, 2)
- rw_hack - my hacky layer - https://github.com/rwightman/pytorch-image-models/blob/a45b4bce9a022d413eb27a342a7a9997580bb4aa/timm/models/layers/norm.py#L54
- rw_hack_ts - my hacky layer torchscripted (just the
_layer_norm_cf
part) - ng_hack - Natalias (@ngimel) modifications to my hack (for better nvfuser performance) - huggingface/pytorch-image-models#1340
- permute -
Models with _c
use conv_mlp=True
arg to use 1x1 convs instead of nn.Linear for MLP blocks. While for ConvNeXt they are usually worse across the board, this mode does better represent a mix of layers found in some other models where this LN is useful... (where it could be a kxk conv, 2D pool layers, stride > 1, etc mixed in the sequence).
csvs w/ all
combine all cogegen, with eager
are showing just eager (with the ts of layer only included for rw_hack). Sorted by inference and train throughputs...