Skip to content

Instantly share code, notes, and snippets.

@tcantenot
tcantenot / PerfectHash.cpp
Created January 24, 2021 13:47 — forked from dwilliamson/PerfectHash.cpp
Trivial perfect hash generator that uses only a single array for runtime lookup with next to no ALU. Very good for small table sizes (e.g. < 128). Very bad for larger table sizes. This has been quickly "STL-ified" to remove use of my own containers.
uint32_t NextPow2(uint32_t v)
{
v--;
v |= (v >> 1);
v |= (v >> 2);
v |= (v >> 4);
v |= (v >> 8);
v |= (v >> 16);
v++;
@tcantenot
tcantenot / CuteSort.hlsl
Created December 5, 2020 13:11 — forked from dondragmer/CuteSort.hlsl
A very fast GPU sort for sorting values within a wavefront
Buffer<uint> Input;
RWBuffer<uint> Output;
//returns the index that this value should be moved to to sort the array
uint CuteSort(uint value, uint laneIndex)
{
uint smallerValuesMask = 0;
uint equalValuesMask = ~0;
//don't need to test every bit if your value is constrained to a smaller range

Minimal D3D11

Minimal D3D11 reference implementation: An uncluttered Direct3D 11 setup & basic rendering primer / API familiarizer. Complete, runnable Windows application contained in a single function and laid out in a linear, step-by-step fashion that should be easy to follow from the code alone. ~215 LOC. No modern C++ / OOP / obscuring cruft. View on YouTube

hollowcube

@tcantenot
tcantenot / shared_mem_indexing.cu
Last active November 2, 2020 12:03
Shared memory tile with borders fetches
// Store tile values (w/ border) in shared memory:
// - first compute and store the "border" values (by a subset of the threads of the block)
// - then compute the "center" values (the one corresponding to each thread that will be used afterwards).
//
// "Center" values: o
// "Border" values: c, h, v
//
// The "borders" values are classified in 3 groups:
// - The first tile row + the left border of the 2nd row (range end: r0)
// - The left and right borders values + the border and the last row (range end: r1)
@tcantenot
tcantenot / lop3.cu
Created March 25, 2020 23:54 — forked from allanmac/lop3.cu
Test to see if the bit hack "Conditionally set or clear bits without branching" maps to a single Maxwell LOP3.LUT opcode
// -*- compile-command: "nvcc -m 32 -arch sm_50 -Xptxas=-v,-abi=no -cubin lop3.cu"; -*-
#define KERNEL_QUALIFIERS extern "C" __global__
//
// Bit hack: "Conditionally set or clear bits without branching"
// http://graphics.stanford.edu/~seander/bithacks.html#ConditionalSetOrClearBitsWithoutBranching
//
// This bit hack *should* map to a single LOP3.LUT opcode:
//