Matt MattPD

GPU Optimization for GameDev

Graphics Pipeline / GPU Architecture Overview

2011 - A trip through the Graphics Pipeline 2011
2013 - Performance Optimization Guidelines and the GPU Architecture behind them
2015 - Life of a triangle - NVIDIA's logical pipeline
2015 - Render Hell 2.0
2016 - How bad are small triangles on GPU and why?
2017 - GPU Performance for Game Artists
2019 - Understanding the anatomy of GPUs using Pokémon

Multi-dimensional array views for systems programmers

As C programmers, most of us think of pointer arithmetic for multi-dimensional arrays in a nested way:

The address for a 1-dimensional array is base + x. The address for a 2-dimensional array is base + x + y*x_size for row-major layout and base + y + x*y_size for column-major layout. The address for a 3-dimensional array is base + x + (y + z*y_size)*x_size for row-column-major layout. And so on.

Here's a list of mildly interesting things about the C language that I learned mostly by consuming Clang's ASTs. Although surprises are getting sparser, I might continue to update this document over time.

There are many more mildly interesting features of C++, but the language is literally known for being weird, whereas C is usually considered smaller and simpler, so this is (almost) only about C.

1. Combined type and variable/field declaration, inside a struct scope [https://godbolt.org/g/Rh94Go]

struct foo {
   struct bar {
 int x;

High-Performance Matrix Multiplication

This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).

Intro

Matrix multiplication is a mathematical operation that defines the product of

why doesn't radfft support AVX on PC?

So there's two separate issues here: using instructions added in AVX and using 256-bit wide vectors. The former turns out to be much easier than the latter for our use case.

Problem number 1 was that you positively need to put AVX code in a separate file with different compiler settings (/arch:AVX for VC++, -mavx for GCC/Clang) that make all SSE code emitted also use VEX encoding, and at the time radfft was written there was no way in CDep to set compiler flags for just one file, just for the overall build.

[There's the GCC "target" annotations on individual funcs, which in principle fix this, but I ran into nasty problems with this for several compiler versions, and VC++ has no equivalent, so we're not currently using that and just sticking with different compilation units.]

The other issue is to do with CPU power management.

Intel 4004, first microprocessor: http://www.computerhistory.org/collections/catalog/102658187
Intel 8008: http://www.computerhistory.org/collections/catalog/102657982
Intel 8080: http://www.computerhistory.org/collections/catalog/102658123
Z80: http://www.computerhistory.org/collections/catalog/102658073
Federico Faggin, SGT inventor, chip designer for 4004, Z80: http://www.computerhistory.org/collections/catalog/102658025
Bill Mensch, chip designer on 6800/6502/65C02/65816: http://www.computerhistory.org/collections/catalog/102739969
Motorolla 68000: http://www.computerhistory.org/collections/catalog/102658109
3dfx, Voodoo, the seminal GPU: http://www.computerhistory.org/collections/catalog/102746834
LSI Logic, EDA/fabless innovator: http://www.computerhistory.org/collections/catalog/102746194
VLSI Technologies, EDA/fabless innovator: http://www.computerhistory.org/collections/catalog/102746456

What is the Strict Aliasing Rule and Why do we care?

(OR Type Punning, Undefined Behavior and Alignment, Oh My!)

What is strict aliasing? First we will describe what is aliasing and then we can learn what being strict about it means.

In C and C++ aliasing has to do with what expression types we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule. If we attempt to access a value using a type not allowed it is classified as undefined behavior(UB). Once we have undefined behavior all bets are off, the results of our program are no longer reliable.

Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the a future version of a compiler with a new optimization will break code we th

Effective Modern CMake

Getting Started

For a brief user-level introduction to CMake, watch C++ Weekly, Episode 78, Intro to CMake by Jason Turner. LLVM’s CMake Primer provides a good high-level introduction to the CMake syntax. Go read it now.

After that, watch Mathieu Ropert’s CppCon 2017 talk Using Modern CMake Patterns to Enforce a Good Modular Design (slides). It provides a thorough explanation of what modern CMake is and why it is so much better than “old school” CMake. The modular design ideas in this talk are based on the book [Large-Scale C++ Software Design](https://www.amazon.de/Large-Scale-Soft

Papers I like Pt. 1 Papers I like Pt. 2

Let's start meta:

Lamport - State the Problem Before Describing the Solution (1978). … 1-page memo. Read it.
Herlihy - Wait-free synchronization (1991) … Truly seminal. Lucid + enough good ideas for 4 papers easily.
Cook - How complex systems fail (1998) … 4 pages that anyone working on/with complex systems should read.

	// ### [ Lexical part ] ########################################################

	_ascii_letter_upper
	: 'A' - 'Z'
	;

	_ascii_letter_lower
	: 'a' - 'z'
	;