taher nullhook

Things to remember when compiling/linking C/C++ software

by Angel Leon. March 17, 2015;

Last update on December 14, 2023

Updated on February 27, 2023

Updated August 29, 2019.

High-Performance Matrix Multiplication

This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).

Intro

Matrix multiplication is a mathematical operation that defines the product of

	SDK = xcrun -sdk macosx

	all: compute.metallib compute

	compute.metallib: Compute.metal
	# Metal intermediate representation (.air)
	$(SDK) metal -c -Wall -Wextra -std=osx-metal2.0 -o /tmp/Compute.air $^
	# Metal library (.metallib)
	$(SDK) metallib -o $@ /tmp/Compute.air

	import React, { Component } from 'react';
	import styled from 'styled-components';

	const Figure = styled.figure`
	height: 0;
	margin: 0;
	background-color: #efefef;
	position: relative;
	padding-bottom: ${props => props.ratio}%;
	`;

	.root {
	display: block;
	position: relative;
	}

	.lqip {
	image-rendering: pixelated;
	width: 100%;
	opacity: 1;
	transition: opacity 50ms 100ms ease-out;

	mlir-opt matmult.mlir -convert-linalg-to-loops -lower-affine -convert-scf-to-cf -convert-linalg-to-llvm -convert-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts > out.mlir
	mlir-cpu-runner out.mlir -O3 -e main -entry-point-result=void --shared-libs=libmlir_runner_utils.dylib