Skip to content

Instantly share code, notes, and snippets.

@dondragmer
dondragmer / BitonicSort.hsl
Last active March 31, 2025 09:09
An optimized bitonic sorting compute shader. Has no bank conflicts, uses shader model 6.6 wave interstices, and works with different sort sizes and per-wave lane counts.
RWBuffer<uint> SortBuffer : register(u0);
static const uint sSortSize = 1024; // can be any power of up to 1024 (the max threads in a group)
// to avoid all bank conflicts there needs to be a space of padding inserted at every multiple of every power of the wave size
// (if an index is a multiple of several powers of the wave size a pad needs to be added for each)
// the smallest wave size possible is 4, so the most padding needed is (sort size) * (1/4 + 1/16 + 1/64 ...)
// which convergets to (sort size) / 3;
static const uint sGroupsharedSortValuesToPaddingRatio = 3;
@dondragmer
dondragmer / PrefixSort.compute
Created January 20, 2021 23:32
An optimized GPU counting sort
#pragma use_dxc //enable SM 6.0 features, in Unity this is only supported on version 2020.2.0a8 or later with D3D12 enabled
#pragma kernel CountTotalsInBlock
#pragma kernel BlockCountPostfixSum
#pragma kernel CalculateOffsetsForEachKey
#pragma kernel FinalSort
uint _FirstBitToSort;
int _NumElements;
int _NumBlocks;
bool _ShouldSortPayload;
@dondragmer
dondragmer / CuteSort.hlsl
Created December 5, 2020 00:11
A very fast GPU sort for sorting values within a wavefront
Buffer<uint> Input;
RWBuffer<uint> Output;
//returns the index that this value should be moved to to sort the array
uint CuteSort(uint value, uint laneIndex)
{
uint smallerValuesMask = 0;
uint equalValuesMask = ~0;
//don't need to test every bit if your value is constrained to a smaller range