This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).
Matrix multiplication is a mathematical operation that defines the product of