- 2011 - A trip through the Graphics Pipeline 2011
- 2015 - Life of a triangle - NVIDIA's logical pipeline
- 2015 - Render Hell 2.0
- 2016 - How bad are small triangles on GPU and why?
- 2017 - GPU Performance for Game Artists
- 2019 - Understanding the anatomy of GPUs using Pokémon
- 2020 - GPU ARCHITECTURE RESOURCES
- 2020 - All the pipelines - journey through the GPU
- Emil Persson @Humus
- Matt Pettineo @mynameismjp
- Blog
- 2018 - Breaking Down Barriers
- 2021 - The Shader Permutation Problem
- Louis Bavoil @louisbavoil
- 2018 - The Peak-Performance-Percentage Analysis Method for Optimizing Any GPU Workload
- 2018 - Fixing the Hyperdrive: Maximizing Rendering Performance on NVIDIA GPUs (Presented by NVIDIA)
- 2019 - Optimizing DX12/DXR GPU Workloads using Nsight Graphics: GPU Trace and the Peak-Performance-Percentage (P3) Method (Presented by NVIDIA)
- 2020 - Optimizing Compute Shaders for L2 Locality using Thread-Group ID Swizzling
- 2021 - Dana Elifaz - The Next Level of Optimization Advice with Nsight Graphics: GPU Trace
- 2022 - (GDC Paywall) Optimizing Ray Tracing GPU Workloads using Nsight Graphics: GPU Trace and Nsight Systems
- D3D11 Vendor Hacks
- Rys Sommefeldt @ryszu
- Michal Drobot @michaldrobot
- Kostas Anagnostou @KostasAAA
- Matthäus G. Chajdas @NIV_Anteru
- Blog
- 2018 - Introduction to compute shaders
- 2018 - More compute shaders
- 2018 - Even more compute shaders
- GPU database
- Matthijs De Smedt @anji_nl
- 2016 - PC GPU Performance Hot Spots
- Maurizio Cerrato @speedwago
- 2019 - GPU Architectures
- Sebastian Aaltonen @SebAaltonen
- Layla Mah @MissQuickstep
- Sven Andersson @andsve
- Blog
- 2014 - Real-time Rendering Blogs
- Fabian Giesen @rygorous
- Timothy Lottes
- Robert Menzel @renderpipeline
- Blog
- 2012 - Low-Level GPU Documentation
- RasterGrid @rastergrid
- Blog
- 2021 - Understanding GPU caches
- Adam Sawicki @Reg__
- Matías N. Goldberg @matiasgoldberg
- Francesco Cifariello Ciardi @FCifaCiar
- Blog
- 2018 - INTRO TO GPU SCALARIZATION
- Sébastien Lagarde @SebLagarde
- Bart Wronski @BartWronsk
- Elizabeth Baumel @Icetigris
- Anton Schreiner @antonschrein
- AMD
- GPU Open and Talks
- Events Presentations
- AMD GPU ISA documentation (GCN,Vega,CDNA,RDNA,RDNA2)
- 2014 - Vertex Shader Tricks
- 2016 - Leveraging asynchronous queues for concurrent execution
- 2016 - AMD GCN Assembly: Cross-Lane Operations
- 2017 - Wave Programming in D3D12 and Vulkan
- 2017 - D3D12 and Vulkan Done Right
- 2017 - Deep Dive: Asynchronous Compute
- 2018 - Optimize your engine using compute @ 4C Prague 2018 | (Youtube)
- 2018 - Optimization with Radeon GPU Profiler - A Vulkan Case Study
- 2019 - DirectX 12 Optimization Techniques in Capcom’s RE ENGINE
- 2019 - A BLEND OF GCN OPTIMIZATION AND COLOR PROCESSING
- 2019 - AMD GPU Performance Revealed
- 2019 - Triangles Are Precious
- 2020 - Let’s build
- AMD Ryzen™ Processor Software Optimization
- Optimizing for the Radeon™ RDNA Architecture
- From Source to ISA: A Trip Down the Shader Compiler Pipeline
- A Review of GPUOpen Effects
- Curing Amnesia and Other GPU Maladies With AMD Developer Tools
- Radeon™ ProRender Full Spectrum Rendering 2.0: The Universal Rendering API
- 2020 - CONCURRENCY MODEL IN EXPLICIT GRAPHICS APIS
- GCN
- RDNA
- 2019 - INTRODUCING RDNA ARCHITECTURE
- 2019 - RDNA Architecture
- 2020 - "RDNA 1.0" Instruction Set Architecture
- 2020 - "RDNA 2" Instruction Set Architecture
- 2020 - RDNA2 Performance Guide
- 2022 - "RDNA3" Instruction Set Architecture
- Driver
- OpenCL
- Radeon GPU Analyzer / Radeon Raytracing Analyzer
- GPU Open and Talks
- Nvidia
- Developer Blog and Talks
- 2012 - GPU Performance Analysis and Optimization
- 2015 - Constant Buffers without Constant Pain
- 2016 - Practical DirectX 12
- 2016 - Reading Between The Threads: Shader Intrinsics
- 2016 - DX12 Do's And Don'ts
- 2016 - High-Performance, Low-Overhead Rendering with OpenGL and Vulkan
- 2019 - Tips and Tricks: Ray Tracing Best Practices
- 2020 - Optimizing Graphics Applications using Nsight Systems and Nsight Graphics
- 2020 - RTX Ray Tracing Best Practices
- 2021 - Advanced API Performance
- 2022 - Best Practices for Using NVIDIA RTX Ray Tracing (Updated)
- 2023 - Practical Tips for Optimizing Ray Tracing
- 2023 - Advanced API Performance: Shaders
- Pascal
- Turing
- Ampere
- Ada
- 2022 - NVIDIA ADA GPU ARCHITECTURE
- 2022 - SHADER EXECUTION REORDERING
- CUDA
- Developer Blog and Talks
- Intel
- Microsoft
- Khronos Group
- Arm
- GDC
- Advanced Graphics Summit talks, not specifically on optimization
- 2016 - Optimizing the Graphics Pipeline With Compute
- (JP) CEDEC
- 2016 - GPU最適化入門
- (Book) マンガとイラストでわかる! GPU最適化入門
- 2016 - GPU最適化入門
- Siggraph
- Advances in Real-Time Rendering in Games, not specifically on optimization
- 2009 - From Shader Code to a Teraflop: How Shader Cores Work
- 2020 - LOW-LEVEL OPTIMIZATIONS IN THE LAST OF US PART II
- CMU
- PerfTest: GPU shader memory operation performance test tool (with results)
- GPUInfo for Vulkan, OpenGL, OpenGL ES
- (JP) GPU Spec Database by HYPERでんち
- Online Shader Compiler
- Compiler Explorer (godbolt), support DXC, AMD RGA
- Shader Playground, support DXC, FXC, glslang, hlsl2glsl, hlslparser, IntelShaderAnalyzer, AMD RGA, slang, XShaderCompiler
- Microsoft
- Nvidia - NVIDIA Developer Tools
- AMD - Radeon Developer Tool Suite
- Intel
Thanks JoseEmilio-ARM for ARM part.