This file serves a BKM to get better performance on CPU for PyTorch, mostly focusing on inference or deployment. Chinese version available here.
layout
refers to how data is organized in a tensor. PyTorch default layout is NCHW
, from optimization perspective, MKL-DNN library (renamed as DNNL recently) may choose a different layout, sometimes refered to as internal layout or primitive layout. This is actually a normal technique for acceleration libraries, common knowledge is that NHWC
runs faster than NCHW
for convolution, changing the default NCHW
to NHWC
is called a reorder
. MKL-DNN may choose different internal layouts based on the input pattern and the algorithm selected, e.g. nChw16c
, a.k.a. reorder a 4-dim tensor into 5-dim by chop down dimension C by 16, for vectorization purpose (AVX512 instruction length is 16x32 bit).
By default on CPU, conv2d
will ru