Skip to content

Instantly share code, notes, and snippets.

@buybackoff
buybackoff / Benchmark.md
Last active August 31, 2024 20:44
Avx2/branch-optimized binary search in .NET

Binary search is theoretically optimal, but it's possible to speed it up substantially using AVX2 and branchless code even in .NET Core.

Memory access is the limiting factor for binary search. When we access each element for comparison a cache line is loaded, so we could load a 32-byte vector almost free, check if it contains the target value, and if not - reduce the search space by 32/sizeof(T) elements instead of 1 element. This gives quite good performance improvement (code in BinarySearch1.cs and results in the table 1 below).

However, for larger N the search space reduction is quite minimal and the most gains come from switching to linear search. After an interesting discussion in Twitter (especially with @trav_downs), and trying to widen the pivot area to use 2 AVX2 vectors it became clear that just switching to linear search sooner is more important than using AVX2 vectors as pivots.

The linear search was not using AVX2, and for linear

@buybackoff
buybackoff / gitclean.sh
Last active July 12, 2020 12:32
Keep only existing files in git history
# this doesn't support renamed/moved files
# from https://stackoverflow.com/a/17909526/801189
# Works fine on Windows with Git bash shell
$ git checkout master
$ git ls-files > keep-these.txt
$ git filter-branch --force --index-filter \
"git rm --ignore-unmatch --cached -qr . ; \
cat $PWD/keep-these.txt | tr '\n' '\0' | xargs -d '\0' git reset -q \$GIT_COMMIT --" \
--prune-empty --tag-name-filter cat -- --all
@buybackoff
buybackoff / Bench.cs
Last active April 11, 2021 20:54
ByRefCtor
using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
namespace ByRefCtor
{
public readonly struct Test
{
private static ref TestMut AsMutable(ref Test d) => ref Unsafe.As<Test, TestMut>(ref d);
@buybackoff
buybackoff / MemoryBenchmark.cs
Created December 9, 2024 22:45
Memory Benchmark that is faster to code than to find an app. Gives 24.9 on a machine with 25.6 theoretical limit on a single thread.
public class MemoryBenchmark
{
public static void Run()
{
const long sizeGb = 2L;
const long arraySize = sizeGb * 1024 * 1024 * 1024;
const int elementSize = sizeof(long);
long[] data = new long[arraySize / elementSize];
Console.WriteLine($"Allocated {sizeGb}GB array");