randallrvr

GPU Optimization for GameDev

Graphics Pipeline / GPU Architecture Overview

2011 - A trip through the Graphics Pipeline 2011
2015 - Life of a triangle - NVIDIA's logical pipeline
2015 - Render Hell 2.0
2016 - How bad are small triangles on GPU and why?
2017 - GPU Performance for Game Artists
2019 - Understanding the anatomy of GPUs using Pokémon
2020 - GPU ARCHITECTURE RESOURCES

The RSS Endpoint List

Please refer to this blogpost to get an overview.

Replace *-INSTANCE with one of the public instances listed in the scrapers section. Replace CAPITALIZED words with their corresponding identifiers on the website.

Social Media

Twitter

Ellipsoid Frustum Intersection

Yesterday I posted a problem to math stack exchange that bothered me for a while now, and right after I've had a few exchanges on Twitter, I got inspired to attempt a solution.

Here it goes. It's 100% untested but I'm fairly certain that it will work.

The problem is about a form of refining raytracing where we render a big list of convex 3D brushes (and I decided to start with Ellipsoids, since they're so useful) to the screen or a shadow map, without any prebuilt accelleration structure. How does it work? Well, if we had a way to figure out for a portion of the frustum whether it contained a brush, we could

Start with a very low resolution

	struct vec2f {float x, y;};
	struct vec3f {float x, y, z;};

	//============================================================================
	// cone_uniform_vector
	//============================================================================
	// Returns uniformly distributed unit vector on a [0, 0, 1] oriented cone of
	// given apex angle and uniform random vector xi ([x, y] in range [0, 1]).
	// e.g. cos_half_apex_angle = 0 returns samples on a hemisphere (cos(pi/2)=0),
	// while cos_half_apex_angle = -1 returns samples on a sphere (cos(pi)=-1)

	struct vec3f {float x, y, z;};
	struct vec4f {float x, y, z, w;};
	struct mat44f {vec4f x, y, z, w;};

	//============================================================================
	// sphere_screen_extents
	//============================================================================
	// Calculates the exact screen extents xyzw=[left, bottom, right, top] in
	// normalized screen coordinates [-1, 1] for a sphere in view space. For
	// performance, the projection matrix (v2p) is assumed to be setup so that

	In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader
	group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile.

	Simplified HLSL code looks like this:

	Buffer<float4> lightDatas;
	Texture2D<uint2> lightStartCounts;
	RWTexture2D<float4> output;

	[numthreads(8, 8, 1)]

	// NOTE: Must bind 8x single mip RWTexture views, because HLSL doesn't have .mips member for RWTexture2D. (SRVs only have .mips member)
	// NOTE: globallycoherent attribute is needed. Without it writes aren't guaranteed to be seen by other groups
	globallycoherent RWTexture2D<float> MipTextures[8];
	RWTexture2D<uint> Counters[8];
	groupshared uint CounterReturnLDS;

	[numthreads(16, 16, 1)]
	void GenerateMipPyramid(uint3 Tid : SV_DispatchThreadID, uint3 Group : SV_GroupId, uint Gix : SV_GroupIndex)
	{
	[unroll]

	#include <stdio.h>
	#include <math.h>

	float max(float x, float y) {
	return x > y ? x : y;
	}

	class vec3 {
	public:
	float x;

	struct FloatBits
	{
	u32 mantissa : 23;
	u32 exponent : 8;
	u32 sign : 1;
	};

	template <typename ResultT, typename InputT>
	inline ResultT bitCast(InputT v)
	{

	// From https://github.com/google/filament
	float D_GGX(float linearRoughness, float NoH, const vec3 h) {
	// Walter et al. 2007, "Microfacet Models for Refraction through Rough Surfaces"

	// In mediump, there are two problems computing 1.0 - NoH^2
	// 1) 1.0 - NoH^2 suffers floating point cancellation when NoH^2 is close to 1 (highlights)
	// 2) NoH doesn't have enough precision around 1.0
	// Both problem can be fixed by computing 1-NoH^2 in highp and providing NoH in highp as well

	// However, we can do better using Lagrange's identity: