by Angel Leon. March 17, 2015;
Last update on December 14, 2023
Updated on February 27, 2023
Updated August 29, 2019.
This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).
Matrix multiplication is a mathematical operation that defines the product of
SDK = xcrun -sdk macosx | |
all: compute.metallib compute | |
compute.metallib: Compute.metal | |
# Metal intermediate representation (.air) | |
$(SDK) metal -c -Wall -Wextra -std=osx-metal2.0 -o /tmp/Compute.air $^ | |
# Metal library (.metallib) | |
$(SDK) metallib -o $@ /tmp/Compute.air |
import React, { Component } from 'react'; | |
import styled from 'styled-components'; | |
const Figure = styled.figure` | |
height: 0; | |
margin: 0; | |
background-color: #efefef; | |
position: relative; | |
padding-bottom: ${props => props.ratio}%; | |
`; |
.root { | |
display: block; | |
position: relative; | |
} | |
.lqip { | |
image-rendering: pixelated; | |
width: 100%; | |
opacity: 1; | |
transition: opacity 50ms 100ms ease-out; |
mlir-opt matmult.mlir -convert-linalg-to-loops -lower-affine -convert-scf-to-cf -convert-linalg-to-llvm -convert-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts > out.mlir | |
mlir-cpu-runner out.mlir -O3 -e main -entry-point-result=void --shared-libs=libmlir_runner_utils.dylib |