- 2011 - A trip through the Graphics Pipeline 2011
- 2015 - Life of a triangle - NVIDIA's logical pipeline
- 2015 - Render Hell 2.0
- 2016 - How bad are small triangles on GPU and why?
- 2017 - GPU Performance for Game Artists
- 2019 - Understanding the anatomy of GPUs using Pokémon
- 2020 - GPU ARCHITECTURE RESOURCES
struct vec2f {float x, y;}; | |
struct vec3f {float x, y, z;}; | |
//============================================================================ | |
// cone_uniform_vector | |
//============================================================================ | |
// Returns uniformly distributed unit vector on a [0, 0, 1] oriented cone of | |
// given apex angle and uniform random vector xi ([x, y] in range [0, 1]). | |
// e.g. cos_half_apex_angle = 0 returns samples on a hemisphere (cos(pi/2)=0), | |
// while cos_half_apex_angle = -1 returns samples on a sphere (cos(pi)=-1) |
struct vec3f {float x, y, z;}; | |
struct vec4f {float x, y, z, w;}; | |
struct mat44f {vec4f x, y, z, w;}; | |
//============================================================================ | |
// sphere_screen_extents | |
//============================================================================ | |
// Calculates the exact screen extents xyzw=[left, bottom, right, top] in | |
// normalized screen coordinates [-1, 1] for a sphere in view space. For | |
// performance, the projection matrix (v2p) is assumed to be setup so that |
Please refer to this blogpost to get an overview.
Replace *-INSTANCE
with one of the public instances listed in the scrapers section. Replace CAPITALIZED
words with their corresponding identifiers on the website.
In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader | |
group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile. | |
Simplified HLSL code looks like this: | |
Buffer<float4> lightDatas; | |
Texture2D<uint2> lightStartCounts; | |
RWTexture2D<float4> output; | |
[numthreads(8, 8, 1)] |
// NOTE: Must bind 8x single mip RWTexture views, because HLSL doesn't have .mips member for RWTexture2D. (SRVs only have .mips member) | |
// NOTE: globallycoherent attribute is needed. Without it writes aren't guaranteed to be seen by other groups | |
globallycoherent RWTexture2D<float> MipTextures[8]; | |
RWTexture2D<uint> Counters[8]; | |
groupshared uint CounterReturnLDS; | |
[numthreads(16, 16, 1)] | |
void GenerateMipPyramid(uint3 Tid : SV_DispatchThreadID, uint3 Group : SV_GroupId, uint Gix : SV_GroupIndex) | |
{ | |
[unroll] |
#include <stdio.h> | |
#include <math.h> | |
float max(float x, float y) { | |
return x > y ? x : y; | |
} | |
class vec3 { | |
public: | |
float x; |
struct FloatBits | |
{ | |
u32 mantissa : 23; | |
u32 exponent : 8; | |
u32 sign : 1; | |
}; | |
template <typename ResultT, typename InputT> | |
inline ResultT bitCast(InputT v) | |
{ |
Yesterday I posted a problem to math stack exchange that bothered me for a while now, and right after I've had a few exchanges on Twitter, I got inspired to attempt a solution.
Here it goes. It's 100% untested but I'm fairly certain that it will work.
The problem is about a form of refining raytracing where we render a big list of convex 3D brushes (and I decided to start with Ellipsoids, since they're so useful) to the screen or a shadow map, without any prebuilt accelleration structure. How does it work? Well, if we had a way to figure out for a portion of the frustum whether it contained a brush, we could
- Start with a very low resolution
// From https://github.com/google/filament | |
float D_GGX(float linearRoughness, float NoH, const vec3 h) { | |
// Walter et al. 2007, "Microfacet Models for Refraction through Rough Surfaces" | |
// In mediump, there are two problems computing 1.0 - NoH^2 | |
// 1) 1.0 - NoH^2 suffers floating point cancellation when NoH^2 is close to 1 (highlights) | |
// 2) NoH doesn't have enough precision around 1.0 | |
// Both problem can be fixed by computing 1-NoH^2 in highp and providing NoH in highp as well | |
// However, we can do better using Lagrange's identity: |