Skip to content

Instantly share code, notes, and snippets.

View paulevans's full-sized avatar

Paul Evans paulevans

View GitHub Profile
@sebbbi
sebbbi / FastUniformLoadWithWaveOps.txt
Last active May 26, 2025 11:54
Fast uniform load with wave ops (up to 64x speedup)
In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader
group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile.
Simplified HLSL code looks like this:
Buffer<float4> lightDatas;
Texture2D<uint2> lightStartCounts;
RWTexture2D<float4> output;
[numthreads(8, 8, 1)]
@sinbad
sinbad / SlotArray.cs
Created November 9, 2018 15:27
SlotArray: an array as convenient as a dynamic list but with exposed indexes (similar to using a Dictionary<int,T> but more memory friendly, or an ArrayList which self-manages free slots)
using System;
using System.Collections;
using System.Collections.Generic;
/// <summary>
/// Utility class which stores a dynamic array of objects or value types, and exposes
/// where it places them in its internal storage so you can remove them by index
/// if you need to. Indexes remain stable at all times.
///
/// This is useful for cases like Coroutine where you may not have a reference
@sinbad
sinbad / HashStream.cs
Last active January 15, 2023 14:21
HashStream: simple C# Stream wrapper that calculates a hash of anything passed through it, but doesn't alter the content
using System.IO;
using System.Security.Cryptography;
// Copyright 2018 Steve Streeting
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
// in the Software without restriction, including without limitation the rights
// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the Software is
//
// Author: Jonathan Blow
// Version: 1
// Date: 31 August, 2018
//
// This code is released under the MIT license, which you can find at
//
// https://opensource.org/licenses/MIT
//
//

why doesn't radfft support AVX on PC?

So there's two separate issues here: using instructions added in AVX and using 256-bit wide vectors. The former turns out to be much easier than the latter for our use case.

Problem number 1 was that you positively need to put AVX code in a separate file with different compiler settings (/arch:AVX for VC++, -mavx for GCC/Clang) that make all SSE code emitted also use VEX encoding, and at the time radfft was written there was no way in CDep to set compiler flags for just one file, just for the overall build.

[There's the GCC "target" annotations on individual funcs, which in principle fix this, but I ran into nasty problems with this for several compiler versions, and VC++ has no equivalent, so we're not currently using that and just sticking with different compilation units.]

The other issue is to do with CPU power management.

@shafik
shafik / WhatIsStrictAliasingAndWhyDoWeCare.md
Last active July 7, 2025 14:09
What is Strict Aliasing and Why do we Care?

What is the Strict Aliasing Rule and Why do we care?

(OR Type Punning, Undefined Behavior and Alignment, Oh My!)

What is strict aliasing? First we will describe what is aliasing and then we can learn what being strict about it means.

In C and C++ aliasing has to do with what expression types we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule. If we attempt to access a value using a type not allowed it is classified as undefined behavior(UB). Once we have undefined behavior all bets are off, the results of our program are no longer reliable.

Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the a future version of a compiler with a new optimization will break code we th

@johnb003
johnb003 / CMakeLists.txt
Last active April 13, 2024 09:03 — forked from ClintLiddick/External_GTest.cmake
CMake ExternalProject_Add for Google Mock (gmock) and Google Test (gtest) Libraries With Includes and Example Usage
# Assuming this your tests/CMakeLists.txt (and your libs are setup in a root config)
# Just make sure to include(CTest) in your *root* cmake config.
# 3.9 adds support for "GoogleTest" which enumerates the tests inside
# of the code and adds them to ctest.
cmake_minimum_required(VERSION 3.9)
# Configure google-test as a downloadable library.
include(External_GTest.cmake)
#if UNITY_EDITOR
using System.Reflection;
using UnityEngine;
using UnityEditor;
public class FontSwitcher : EditorWindow
{
[MenuItem("Font/Show Window")]
public static void ShowFontWindow()
{
anonymous
anonymous / common.h
Created August 8, 2017 22:14
assert
#define DEBUGBREAK() __ud2()
#ifdef NDEBUG
#define ASSERT(x) __assume(x)
#define ASSERT_IF(cond, x)
#define ASSERT_IFF(cond, x)
#define ASSERT_IF_ELSE(cond, x, y)
#define ASSERTED_EXPRIF(x, t, f) ( (x) ? (t) : (f) )
#else
#define ASSERT(x) do { if(!(x)) DEBUGBREAK(); } while(0)
@gszauer
gszauer / cross.md
Last active December 19, 2017 09:55
Cross Product XYZ Pattern

Step 1

List result xyz top to bottom

result.x =
result.y = 
result.z =