Skip to content

Instantly share code, notes, and snippets.

@vassvik
vassvik / Readme.md
Last active June 15, 2024 04:15
GLFW + glad on Windows

How to get GLFW + glad working in Windows using the Visual Studio compiler (MSVC):

From an x64 dev command prompt. I'm using VS 2019, but earlier and later versions should work the same. GLFW depends on git and cmake being available in PATH. No other dependencies.

Compiling GLFW:

git clone https://github.com/glfw/glfw

cd glfw 
@vassvik
vassvik / Simulation_Projection.md
Last active October 16, 2024 20:36
Realtime Fluid Simulation: Projection

Realtime Fluid Simulation: Projection

The core of most real-time fluid simulators, like the one in EmberGen, are based on the "Stable Fluids" algorithm by Jos Stam, which to my knowledge was first presented at SIGGRAPH '99. This is a post about one part of this algorithm that's often underestimated: Projection

MG4_F32.mp4

Stable Fluids

The Stable Fluids algorithm solves a subset of the famous "Navier Stokes equations", which describe how fluids interact and move. In particular, it typically solves what's called the "incompressible Euler equations", where viscous forces are often ignored.

Card Name VRAM (GB) Type Release Date Bandwidth (GB/s)
------------------------------------------------------------------------------------------
GeForce RTX 2080 Ti | 11 | GDDR6 | Sep 20 2018 | 616.0
Radeon RX 5700 XT | 8 | GDDR6 | Jul 7 2019 | 448.0
Radeon RX 580 | 8 | GDDR5 | Apr 18 2017 | 256.0
Radeon RX 570 | 4 | GDDR5 | Apr 18 2017 | 224.0
GeForce RTX 2060 | 6 | GDDR6 | Jan 7 2019 | 336.0
GeForce RTX 2070 SUPER | 8 | GDDR6 | Jul 9 2019 | 448.0
GeForce GTX 1660 Ti | 6 | GDDR6 | Feb 22 2019 | 288.0
GeForce GTX 1050 Ti | 4 | GDDR5 | Oct 25 2016 | 112.1
31.932 0.781 235.379 cs_filter3D_27stencil.glsl 512x512x512 R16F [8, 8, 8]
31.973 1.101 235.080 cs_filter3D_27stencil.glsl 512x512x512 R16F [32, 32, 1]
31.285 0.552 240.247 cs_filter3D_27stencil.glsl 512x512x512 R16F [32, 1, 32]
30.455 1.047 246.794 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 16, 1]
30.281 0.894 248.218 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 1, 16]
32.020 1.188 234.732 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 16, 4]
31.585 0.934 237.969 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 4, 16]
31.712 0.940 237.013 cs_filter3D_27stencil.glsl 512x512x512 R16F [4, 16, 16]
30.113 0.383 249.603 cs_filter3D_27stencil.glsl 512x512x512 R16F [4, 2, 16]
30.041 0.290 250.198 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 2, 4]
19.432 0.095 386.792 cs_filter3D_27stencil.glsl 512x512x512 R16F [8, 8, 8]
19.150 0.149 392.494 cs_filter3D_27stencil.glsl 512x512x512 R16F [32, 32, 1]
18.925 0.132 397.147 cs_filter3D_27stencil.glsl 512x512x512 R16F [32, 1, 32]
18.203 0.138 412.910 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 16, 1]
18.483 0.128 406.655 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 1, 16]
19.548 0.142 384.503 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 16, 4]
19.298 0.167 389.487 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 4, 16]
19.458 0.116 386.272 cs_filter3D_27stencil.glsl 512x512x512 R16F [4, 16, 16]
18.272 0.117 411.344 cs_filter3D_27stencil.glsl 512x512x512 R16F [4, 2, 16]
18.279 0.696 411.186 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 2, 4]
19.344 0.114 cs_filter3D_27stencil.glsl 512x512x512 R16F [8, 8, 8]
19.045 0.116 cs_filter3D_27stencil.glsl 512x512x512 R16F [32, 32, 1]
18.796 0.202 cs_filter3D_27stencil.glsl 512x512x512 R16F [32, 1, 32]
18.108 0.386 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 16, 1]
18.860 6.760 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 1, 16]
19.676 0.094 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 16, 4]
19.427 0.106 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 4, 16]
19.628 0.196 cs_filter3D_27stencil.glsl 512x512x512 R16F [4, 16, 16]
18.416 0.249 cs_filter3D_27stencil.glsl 512x512x512 R16F [4, 2, 16]
18.358 0.250 cs_filter3D_27stencil.glsl 512x512x512 R16F [16, 2, 4]

Compute memory access pattern throughput benchmarks

We test the runtimes of simple compute shaders reading from one 3D texture using some kind of filter, and writing back to another texture. The local work group size of the compute shader is varied for some arbitrary set of work group sizes, and the effect of different internal texture formats are studied.

All tests are performed using 512x512x512 3D textures. At this size memory throughput and latency will be the primary bottleneck, so any extra calculations should have negligible impact on the timings.

All timings are measured by averaging the frame time across 128 frames, with a 128 frame warmup, with vsync disabled. Using queries might provide more stable numbers.

The work group sizes are:

@vassvik
vassvik / 1
Last active March 5, 2019 09:13
b: 1x1x1 = 1 elements, 1 non-zero, 0 zeros
1
x: 3x3x3 = 27 elements, 6 non-zeros, 21 zeros
0 0 0
0 1 0
0 0 0
@vassvik
vassvik / obj_loader.go
Last active October 17, 2022 03:49
odin obj loader
package main
import "core:os";
import "core:fmt";
import "core:strconv";
// model data stuff
Model_Data :: struct {
vertices: [][3]f32,
indices: []i32,
@vassvik
vassvik / circular_buffer.go
Last active September 8, 2018 13:07
Simple circular buffer in Odin
Circular_Buffer :: struct(T: type, N: int) {
data: [N]T,
cursor: int,
length: int,
}
push_back :: inline proc(using cb: ^$T/Circular_Buffer, v: T.T) -> bool #no_bounds_check {
data[(cursor + length) %% T.N] = v;
if length < T.N {