- 2011 - A trip through the Graphics Pipeline 2011
- 2015 - Life of a triangle - NVIDIA's logical pipeline
- 2015 - Render Hell 2.0
- 2016 - How bad are small triangles on GPU and why?
- 2017 - GPU Performance for Game Artists
- 2019 - Understanding the anatomy of GPUs using Pokémon
- 2020 - GPU ARCHITECTURE RESOURCES
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// | |
// C++ implementaion of "A simple method to construct isotropic quasirandom blue | |
// noise point sequences" | |
// | |
// http://extremelearning.com.au/a-simple-method-to-construct-isotropic-quasirandom-blue-noise-point-sequences/ | |
// | |
// Assume 0 <= x | |
static double myfmod(double x) { return x - std::floor(x); } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
All current buffer types in shading languages are slightly different ways to present homogeneous arrays (single struct or type repeating N times in memory). | |
DirectX has raw buffers (RWByteAddressBuffer) but that is limited to 32 bit integer types and the implementation doesn't require natural alignment for wide loads resulting in suboptimal codegen on Nvidia GPUs. | |
Complex use cases, such as tree traversal in spatial data structures (physics, ray-tracing, etc) require data structure that is non-homogeneous. You want different node payloads and tight memory layout. | |
Ability to mix 8/16/32 bit data types and 1d/2d/4d vectors to faciliate GPU wide loads (max bandwidth) in same data structure is crucial for complex use cases like this. | |
On the other hand we want better more readable/maintainable code syntax than DirectX raw buffers without manual bit packing/extracting and reinterpret casting. Goal should be to allow modern GPUs to use sub-register addressing (SDWA on AMD hardware). Saving both ALU and register |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// http://bearcave.com/misl/misl_tech/wavelets/index.html | |
class WaveletBase { | |
constructor() { | |
this.forward = 1; | |
this.inverse = 2; | |
} | |
split(vec, N) { | |
var half = N >> 1; | |
var vc = vec.slice(); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#pragma once | |
#include <stdint.h> | |
#include <string.h> | |
#define WAVELET_DIM 512 | |
extern void wavelet_forward_2d(uint8_t *mat, size_t N); | |
extern void wavelet_inverse_2d(uint8_t *mat, size_t N); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#pragma use_dxc //enable SM 6.0 features, in Unity this is only supported on version 2020.2.0a8 or later with D3D12 enabled | |
#pragma kernel CountTotalsInBlock | |
#pragma kernel BlockCountPostfixSum | |
#pragma kernel CalculateOffsetsForEachKey | |
#pragma kernel FinalSort | |
uint _FirstBitToSort; | |
int _NumElements; | |
int _NumBlocks; | |
bool _ShouldSortPayload; |
A quick breakdown of lighting in the restir-meets-surfel
branch of my renderer, where I revive some olde surfel experiments, and generously sprinkle ReSTIR on top.
Please note that this is all based on work-in-progress experimental software, and represents a single snapshot in development history. Things will certainly change 😛
Due to how I'm capturing this, there's frame-to-frame variability, e.g. different rays being shot, TAA shimmering slightly. Some of the images come from a dedicated visualization pass, and are anti-aliased, and some show internal buffers which are not anti-aliased.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// Public Domain under http://unlicense.org, see link for details. | |
// except: | |
// * core-math function `cr_cbrtf` (see license below) | |
// * musl flavored fdlib function `fdlibm_cbrtf` (see license below) | |
// code and test driver for cube root and it's reciprocal based on: | |
// "Fast Calculation of Cube and Inverse Cube Roots Using | |
// a Magic Constant and Its Implementation on Microcontrollers" | |
// Moroz, Samotyy, Walczyk, Cieslinski, 2021 | |
// (PDF: https://www.mdpi.com/1996-1073/14/4/1058) |
OlderNewer