This file serves a BKM to get better performance on CPU for PyTorch, mostly focusing on inference or deployment. Chinese version available here.
layout refers to how data is organized in a tensor. PyTorch default layout is NCHW, from optimization perspective, MKL-DNN library (renamed as DNNL recently) may choose a different layout, sometimes refered to as internal layout or primitive layout. This is actually a normal technique for acceleration libraries, common knowledge is that NHWC runs faster than NCHW for convolution, changing the default NCHW to NHWC is called a reorder. MKL-DNN may choose different internal layouts based on the input pattern and the algorithm selected, e.g. nChw16c, a.k.a. reorder a 4-dim tensor into 5-dim by chop down dimension C by 16, for vectorization purpose (AVX512 instruction length is 16x32 bit).
By default on CPU, conv2d will ru