- 2011 - A trip through the Graphics Pipeline 2011
- 2013 - Performance Optimization Guidelines and the GPU Architecture behind them
- 2015 - Life of a triangle - NVIDIA's logical pipeline
- 2015 - Render Hell 2.0
- 2016 - How bad are small triangles on GPU and why?
- 2017 - GPU Performance for Game Artists
- 2019 - Understanding the anatomy of GPUs using Pokémon
As C programmers, most of us think of pointer arithmetic for multi-dimensional arrays in a nested way:
The address for a 1-dimensional array is base + x.
The address for a 2-dimensional array is base + x + y*x_size for row-major layout and base + y + x*y_size for column-major layout.
The address for a 3-dimensional array is base + x + (y + z*y_size)*x_size for row-column-major layout.
And so on.
Here's a list of mildly interesting things about the C language that I learned mostly by consuming Clang's ASTs. Although surprises are getting sparser, I might continue to update this document over time.
There are many more mildly interesting features of C++, but the language is literally known for being weird, whereas C is usually considered smaller and simpler, so this is (almost) only about C.
1. Combined type and variable/field declaration, inside a struct scope [https://godbolt.org/g/Rh94Go]
struct foo {
struct bar {
int x;This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).
Matrix multiplication is a mathematical operation that defines the product of
why doesn't radfft support AVX on PC?
So there's two separate issues here: using instructions added in AVX and using 256-bit wide vectors. The former turns out to be much easier than the latter for our use case.
Problem number 1 was that you positively need to put AVX code in a separate file with different compiler settings (/arch:AVX for VC++, -mavx for GCC/Clang) that make all SSE code emitted also use VEX encoding, and at the time radfft was written there was no way in CDep to set compiler flags for just one file, just for the overall build.
[There's the GCC "target" annotations on individual funcs, which in principle fix this, but I ran into nasty problems with this for several compiler versions, and VC++ has no equivalent, so we're not currently using that and just sticking with different compilation units.]
The other issue is to do with CPU power management.
| // ### [ Lexical part ] ######################################################## | |
| _ascii_letter_upper | |
| : 'A' - 'Z' | |
| ; | |
| _ascii_letter_lower | |
| : 'a' - 'z' | |
| ; |
- Intel 4004, first microprocessor: http://www.computerhistory.org/collections/catalog/102658187
- Intel 8008: http://www.computerhistory.org/collections/catalog/102657982
- Intel 8080: http://www.computerhistory.org/collections/catalog/102658123
- Z80: http://www.computerhistory.org/collections/catalog/102658073
- Federico Faggin, SGT inventor, chip designer for 4004, Z80: http://www.computerhistory.org/collections/catalog/102658025
- Bill Mensch, chip designer on 6800/6502/65C02/65816: http://www.computerhistory.org/collections/catalog/102739969
- Motorolla 68000: http://www.computerhistory.org/collections/catalog/102658109
- 3dfx, Voodoo, the seminal GPU: http://www.computerhistory.org/collections/catalog/102746834
- LSI Logic, EDA/fabless innovator: http://www.computerhistory.org/collections/catalog/102746194
- VLSI Technologies, EDA/fabless innovator: http://www.computerhistory.org/collections/catalog/102746456
What is strict aliasing? First we will describe what is aliasing and then we can learn what being strict about it means.
In C and C++ aliasing has to do with what expression types we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule. If we attempt to access a value using a type not allowed it is classified as undefined behavior(UB). Once we have undefined behavior all bets are off, the results of our program are no longer reliable.
Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the a future version of a compiler with a new optimization will break code we th
For a brief user-level introduction to CMake, watch C++ Weekly, Episode 78, Intro to CMake by Jason Turner. LLVM’s CMake Primer provides a good high-level introduction to the CMake syntax. Go read it now.
After that, watch Mathieu Ropert’s CppCon 2017 talk Using Modern CMake Patterns to Enforce a Good Modular Design (slides). It provides a thorough explanation of what modern CMake is and why it is so much better than “old school” CMake. The modular design ideas in this talk are based on the book [Large-Scale C++ Software Design](https://www.amazon.de/Large-Scale-Soft
Papers I like Pt. 1 Papers I like Pt. 2
Let's start meta:
- Lamport - State the Problem Before Describing the Solution (1978). … 1-page memo. Read it.
- Herlihy - Wait-free synchronization (1991) … Truly seminal. Lucid + enough good ideas for 4 papers easily.
- Cook - How complex systems fail (1998) … 4 pages that anyone working on/with complex systems should read.