- Emil Persson @Humus
- Matt Pettineo @mynameismjp
- Blog
- <2018> Breaking Down Barriers
- <2021> The Shader Permutation Problem
- Louis Bavoil @louisbavoil
- <2018> The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload
- <2018> Fixing the Hyperdrive: Maximizing Rendering Performance on NVIDIA GPUs (Presented by NVIDIA)
- <2019> Optimizing DX12/DXR GPU Workloads using Nsight Graphics: GPU Trace and the Peak-Performance-Percentage (P3) Method (Presented by NVIDIA)
- <2020> Optimizing Compute Shaders for L2 Locality using Thread-Group ID Swizzling
- D3D11 Vendor Hacks
- Rys Sommefeldt @ryszu
- Michal Drobot @michaldrobot
- Kostas Anagnostou @KostasAAA
- Matthäus G. Chajdas @NIV_Anteru
- Blog
- <2018> Introduction to compute shaders
- <2018> More compute shaders
- <2018> Even more compute shaders
- GPU database
- Matthijs De Smedt @anji_nl
- <2016> PC GPU Performance Hot Spots
- Maurizio Cerrato @speedwago
- <2019> GPU Architectures
- Sebastian Aaltonen @SebAaltonen
- Layla Mah @MissQuickstep
- Sven Andersson @andsve
- Blog
- <2014> Real-time Rendering Blogs
- Fabian Giesen @rygorous
- Timothy Lottes
- <2016> GCN Memory Coalescing
- <2017> ADVANCED SHADER PROGRAMMING ON GCN
- <2018> Engine Optimization Hot Lap
- Robert Menzel @renderpipeline
- Blog
- <2012> Low-Level GPU Documentation
- Stephanie Hurlburt @sehurlburt
- RasterGrid @rastergrid
- Blog
- <2021> Understanding GPU caches
- GDC
- Search "Advanced Graphics" in GDC Vault or in GDC VAULT EXPLORER
- <2014> Vertex Shader Tricks
- <2016> Optimizing the Graphics Pipeline With Compute
- <2016> High-Performance, Low-Overhead Rendering with OpenGL and Vulkan
- <2016> Practical DirectX 12
- <2017> Wave Programming in D3D12 and Vulkan
- <2017> D3D12 and Vulkan Done Right
- <2017> Deep Dive: Asynchronous Compute
- <2019> DirectX 12 Optimization Techniques in Capcom’s RE ENGINE
- <2019> A BLEND OF GCN OPTIMIZATION AND COLOR PROCESSING
- <2019> AMD GPU Performance Revealed
- Siggraph
- AMD
- GPU Open
- Events Presentations
- <2016> Leveraging asynchronous queues for concurrent execution
- <2018> Optimize your engine using compute @ 4C Prague 2018 | (Youtube)
- <2018> Optimization with Radeon GPU Profiler - A Vulkan Case Study
- <2019> Triangles Are Precious
- <2020> Let’s build
- AMD Ryzen™ Processor Software Optimization
- Optimizing for the Radeon™ RDNA Architecture
- From Source to ISA: A Trip Down the Shader Compiler Pipeline
- A Review of GPUOpen Effects
- Curing Amnesia and Other GPU Maladies With AMD Developer Tools
- Radeon™ ProRender Full Spectrum Rendering 2.0: The Universal Rendering API
- <2020> CONCURRENCY MODEL IN EXPLICIT GRAPHICS APIS
- <2020> All the Pipelines - Journey through the GPU
- GCN
- RDNA
- <2019> INTRODUCING RDNA ARCHITECTURE
- <2019> RDNA Architecture
- <2020> "RDNA 1.0" Instruction Set Architecture
- <2020> RDNA Performance Guide
- <2020> "RDNA 2" Instruction Set Architecture
- OpenCL
- RADEON GPU ANALYZER
- GPU Open
- Nvidia
- Developer Blog
- Pascal
- Turing
- CUDA
- GTC
- Intel
- Microsoft
- Arm
- Khronos Group
- CMU
- Misc
- <2009> From Shader Code to a Teraflop: How Shader Cores Work
- <2016> [JP] GPU最適化入門
- <2017> Demystifying Asynchronous Compute
- <2019> Unity GPU culling experiments
- <2019> What's up with my branch on GPU?
- <2011> A trip through the Graphics Pipeline 2011
- <2015> Life of a triangle - NVIDIA's logical pipeline
- <2015> Render Hell 2.0
- <2016> How bad are small triangles on GPU and why?
- <2017> GPU Performance for Game Artists
- <2019> Understanding the anatomy of GPUs using Pokémon
- <2020> Graphics Studies Compilation
- [WIP] Unreal Art Optimization
- GPU shader memory operation performance test
- GPUInfo for Vulkan, OpenGL, OpenGL ES
- [JP] GPU Spec Database by HYPERでんち
Thanks JoseEmilio-ARM for ARM part.