sebbbi / BetterBuffers.txt

Created February 28, 2019 05:04

Better buffers

	All current buffer types in shading languages are slightly different ways to present homogeneous arrays (single struct or type repeating N times in memory).

	DirectX has raw buffers (RWByteAddressBuffer) but that is limited to 32 bit integer types and the implementation doesn't require natural alignment for wide loads resulting in suboptimal codegen on Nvidia GPUs.

	Complex use cases, such as tree traversal in spatial data structures (physics, ray-tracing, etc) require data structure that is non-homogeneous. You want different node payloads and tight memory layout.

	Ability to mix 8/16/32 bit data types and 1d/2d/4d vectors to faciliate GPU wide loads (max bandwidth) in same data structure is crucial for complex use cases like this.

	On the other hand we want better more readable/maintainable code syntax than DirectX raw buffers without manual bit packing/extracting and reinterpret casting. Goal should be to allow modern GPUs to use sub-register addressing (SDWA on AMD hardware). Saving both ALU and register

nbouteme / ss-fs.glsl

Last active October 29, 2024 09:54

Skyward Sword Brush shader. Accurately emulates what's done with TEVs in a shader. Does NOT include the blurring pass.

	#version 300 es
	precision highp float;

	in vec2 UV;
	out vec4 out_color;
	uniform float ratio, time;
	uniform sampler2D texture0;

	const float PI_3 = 1.0471975512;

niklas-ourmachinery / reducing-build-times.md

Created January 24, 2019 16:30

Reducing build times by 20 % with a one line change

Experimenting a bit with the /d2cgsummary and /d1reportTime flags described by Aras here and here I noticed that one of our functions was consistently showing up in the Anomalistic Compile Times section:

1>	Anomalistic Compile Times: 2
1>		create_truth_types: 0.643 sec, 2565 instrs
1>		draw_nodes: 0.180 sec, 5348 instrs

Promit / owner_ptr.h

Last active January 22, 2019 12:14

	/*This is free and unencumbered software released into the public domain.

	Anyone is free to copy, modify, publish, use, compile, sell, or
	distribute this software, either in source code form or as a compiled
	binary, for any purpose, commercial or non-commercial, and by any
	means.

	In jurisdictions that recognize copyright laws, the author or authors
	of this software dedicate any and all copyright interest in the
	software to the public domain. We make this dedication for the benefit

sergekukharev / Latency Numbers Every Programmer Should Know.md

Last active April 8, 2024 11:31

Latency Comparison Numbers (~2012)

Name
L1 cache reference	0.5	ns
Branch mispredict	5	ns
L2 cachereference	7	ns			14x L1 cache
Mutex lock/unlock	25	ns
Main memory reference	100	ns			20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy	3,000	ns	3	us

zeux / roblox-graphics-apis-2019.md

Last active October 3, 2023 08:56

State of Roblox graphics API across all platforms, with percentage deltas since EOY 2018. Updated December 29 2019.

Windows

API	Share
Direct3D 11+	85% (+5%)
Direct3D 10.1	8.5% (-1.5%)
Direct3D 10.0	5.5% (-2.5%)
Direct3D 9	1% (-1%)

rlabrecque / UESizeofTypes.md

Last active March 25, 2026 13:41

Unreal Engine sizeof() types

	Latency Comparison Numbers Simplified (~2012)
	---------------------------------- log2 log10
	L1 cache reference 0 0 ~ 1 ns
	Branch mispredict 3 1
	L2 cache reference 4 1
	Mutex lock/unlock 6 2
	Main memory reference 8 2
	Compress 1K bytes with Zippy 13 4
	Send 1K bytes over 1 Gbps network 14 4
	Read 4K randomly from SSD* 18 5

jspohr / microsecs.c

Last active January 7, 2026 06:16

Avoid overflow when converting time to microseconds

	// Taken from the Rust code base: https://github.com/rust-lang/rust/blob/3809bbf47c8557bd149b3e52ceb47434ca8378d5/src/libstd/sys_common/mod.rs#L124
	// Computes (value*numer)/denom without overflow, as long as both
	// (numer*denom) and the overall result fit into i64 (which is the case
	// for our time conversions).
	int64_t int64MulDiv(int64_t value, int64_t numer, int64_t denom) {
	int64_t q = value / denom;
	int64_t r = value % denom;
	// Decompose value as (value/denom*denom + value%denom),
	// substitute into (value*numer)/denom and simplify.
	// r < denom, so (denomnumer) is the upper bound of (rnumer)

BeRo1985 / MiniSoftFP32.pas

Last active July 30, 2018 02:19

MiniSoftFP32 - A simple small software 32-bit single precision floating point implementation

	unit MiniSoftFP32; // Copyright (C) 2018, Benjamin "BeRo" Rosseaux (benjamin@rosseaux.de) - License: CC0
	// Declaimer / Notice of caution:
	// Attention, this code implements only the basic functions, but for example not the correct handling of
	// Infinity, NaN, division-by-zero special cases and so on.
	// In short, this code is only intended for demystifying the base floating point arithmetics (using 32-bit
	// single precision floating point values in this implementation).
	{$ifdef fpc}
	{$mode delphi}
	{$if defined(cpu386) or defined(cpuamd64)}
	{$asmmode intel}

Dietmar Suoch didito

Reducing build times by 20 % with a one line change

Latency Comparison Numbers (~2012)

Windows

Table of Contents