Last active
February 1, 2025 21:06
-
-
Save camel-cdr/d16fd2be1fd7b71622649e6bc7fd224a to your computer and use it in GitHub Desktop.
Using the Ziggurat Method for Sampling Random Coordinates From a Unit Circle
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/* | |
* Using the Ziggurat Method for Sampling Random Coordinates From a Unit Circle | |
* by Olaf Bernstein <[email protected]>. | |
* Distributed under the MIT license, see license at the end of the file. | |
* New versions available at https://github.com/camel-cdr/cauldron | |
* | |
* Table of contents =========================================================== | |
* | |
* 1. Introduction | |
* 1.1 API Overview | |
* 2. Ziggurat Method | |
* 3. Benchmarking | |
* 3.1 Alternative Implementations | |
* 3.2 Benchmark | |
* 3.3 Results | |
* 4. Testing | |
* References | |
* Licensing | |
* MIT License | |
* | |
* 1. Introduction ============================================================= | |
* | |
* This project was inspired by the article "When Greedy Algorithms Can Be | |
* Faster" <1>, which covers an interesting problem: | |
* | |
* Generate a random coordinate that is uniformly distributed in a unit circle. | |
* | |
* An easy solution to this problem is to generate random coordinates in the | |
* range [-1,1] and reject them if they don't fall in the unit circle: | |
*/ | |
#include <stdint.h> | |
#include <math.h> | |
#include <stddef.h> | |
#include <string.h> | |
typedef struct { float x, y; } Vec2f; | |
static Vec2f | |
dist_circlef(uint64_t (*rand64)(void*), void *rng) | |
{ | |
float fx, fy; | |
do { | |
uint64_t u64 = rand64(rng); | |
fx = (u64 >> (64 - 24)) * (1.0f / ((1u<<24)-1)); | |
fy = (u64 & 0xffffff) * (1.0f / ((1u<<24)-1)); | |
fx = fx*2 - 1; /* scale to [-1,1] */ | |
fy = fy*2 - 1; | |
} while (fx*fx + fy*fy >= 1.0); /* reject */ | |
Vec2f ret = { fx, fy }; | |
return ret; | |
} | |
/* | |
* We'll subsequently call this implementation the "regular rejection method". | |
* It needs to do on average 27% more iterations than the number of times it is | |
* called, which causes lots of branch miss predictions. | |
* | |
* In the following, we'll present an optimized scalar solution, based on the | |
* Ziggurat Method <2>. | |
* | |
* We won't cover SIMD optimized implementations. There the simple rejection | |
* method combined with a compress instruction is likely the best approach. | |
* | |
* 1.1 API Overview ------------------------------------------------------------ | |
* | |
* Random distributions: | |
* // random 2d coordinate from unit circle | |
* float dist_normalf(uint32_t (*)(void*), void *); | |
* | |
* // a faster dist_normalf implementation using a lookup table | |
* void dist_circlef_zig_init(DistCirclefZig *); | |
* Vec2f dist_circlef_zig(DistCirclefZig *, uint64_t (*)(void*), void *); | |
* | |
* | |
* 2. Ziggurat Method ========================================================== | |
* | |
* The Ziggurat Method <2> was originally introduced by George Marsaglia and | |
* Wai Wan Tsang for generating normally distributed numbers <3>. | |
* | |
* It works by overlapping the normal distribution with boxes, that each | |
* encapsulate the same area of the underlying distribution. | |
* This can be used to quickly accept | |
* coordinates using a small lookup table: +----------------------------+ | |
* | exp(-(x*x)/2) ****** | | |
* 1. A box is randomly selected and a +****------- | | |
* random x coordinate is generated within | *** | | | |
* that box, which reduces the number of | ** | | | |
* total rejections. | Box 3 ***| | | |
* +---------***---- | | |
* 2. If the x coordinate falls within the | :** | | | |
* bounds of the box above the selected box, | Box 2 : ***| | | |
* then it is directly accepted. +--------------***---- | | |
* | Box 1 :*** | | | |
* 3. Otherwise, a concrete random y +------------------**** | | |
* coordinate within the box is generated | Box 0 :****** | | |
* and a high resolution rejection test is +----------------------------+ | |
* applied to the coordinates. | |
* | |
* | |
* We can apply a similar technique to our problem of sampling a 2d coordinate | |
* from a unit circle. | |
* For simplicity, we start off looking at only a quarter of the unit circle | |
* that lies in the first quadrant of the coordinate system, and apply random | |
* signs to the final coordinate later. | |
* Now we can again subdivide the quarter circle in n boxes contain an equal | |
* area A=pi/4/n of the circle. | |
* | |
* We can do the step 1. from above and select a random box, and generate a | |
* random x, and this time also y coordinate within the box. | |
* Since the "high resolution" rejection test is, in contrast to the normal, | |
* distribution very cheap for our problem, we skip step 2. and do a quick | |
* rejection test with x*x + y*y < 1. | |
* | |
* |```---...----------------- |```----.. | |
* | ``-.. | | `--. | |
* | ``-. | | `--. | |
* | A1 `-. | | ``. | |
* | `. | |-----------------``.----- | |
* | `.| | .-``. ^ | |
* |-------------------------`----- | A_trig .-` `. | | |
* | `. | | .-`` `. | h | |
* | A2 ` | | .-` A_seg : | | |
* | `| | .-`` `. : | | |
* |------------------------------`.-- |.` alpha : : v | |
* | `.| \------------------------- | |
* | A3 :| | |
* |________________________________-. | |
* | :| | |
* | A4 :| A1 = A2 = A3 = A4 = pi/4/n = pi/4/4 | |
* | :| | |
* \----------------------------------- | |
* | |
* The challenging part is computing the lookup table that contains the sizes of | |
* our boxes. | |
* | |
* We need to calculate the height h under the quarter circle, given an area A. | |
* This can be done by numerically solving the equation that calculates the | |
* area A of the circle for a given height h of a box: | |
* | |
* A = A_trig + A_seg | |
* A_trig = h*cos(alpha)/2 | |
* A_seg = pi * r^2 * alpha/2/pi = alpha/2 | |
* alpha = asin(h) | |
* A = A_trig + A_seg = h*cos(asin(h))/2 + asin(h)/2 | |
* | |
* We know that h is in the range [0,1], so we can use a binary search to solve | |
* the equation: | |
*/ | |
static double | |
dist_circle_zig__solve(double area) | |
{ | |
double A, h = 0.5; | |
for (size_t i = 2; i < 64; ++i) { | |
A = h*cos(asin(h))/2 + asin(h)/2; | |
h += (A < area ? 1.0 : -1.0) / (1ull<<i); | |
} | |
return h; | |
} | |
/* | |
* With this equation worked out, we can now initialize the lookup table. | |
* | |
* The regular rejection method needs to do on average of 27% more iterations, | |
* while the ziggurat method with the default lookup table size of 128 entries, | |
* we get down to 0.76% additional iterations. | |
*/ | |
#ifndef DIST_CIRCLEF_ZIG_COUNT | |
# define DIST_CIRCLEF_ZIG_COUNT 128 /* must be a power of two and <=2^14 */ | |
#endif | |
#define DIST_CIRCLEF_ZIG_AREA (0.7853981633974483/DIST_CIRCLEF_ZIG_COUNT) /* pi/4/n */ | |
typedef struct { | |
float tbl[DIST_CIRCLEF_ZIG_COUNT*2 + 2]; | |
} DistCirclefZig; | |
static void | |
dist_circlef_zig_init(DistCirclefZig *zig) | |
{ | |
for (size_t i = 0; i < DIST_CIRCLEF_ZIG_COUNT; ++i) { | |
double A = DIST_CIRCLEF_ZIG_AREA * i; | |
double h = dist_circle_zig__solve(A); | |
zig->tbl[i*2+0] = cos(asin(h)); | |
zig->tbl[i*2+1] = h; | |
} | |
zig->tbl[DIST_CIRCLEF_ZIG_COUNT*2+0] = 0.0; | |
zig->tbl[DIST_CIRCLEF_ZIG_COUNT*2+1] = 1.0; | |
} | |
/* | |
* Notice how tbl has on additional coordinate and uses a cumulative height. | |
* This allows us to compute the height of a single box with | |
* tbl[i*2+2]-tbl[i*2]. The vertical position is just tbl[i*2]. | |
* | |
* Note that we don't use a Vec2f array for the lookup table, because the code | |
* generation was considerably worse. | |
*/ | |
/* needed to efficiently apply sign */ | |
static float | |
u32tof(uint32_t x) | |
{ | |
#ifdef __cplusplus | |
float f; memcpy(&f, &x, sizeof f); return f; | |
#else | |
union{ uint32_t i; float f; } u; u.i = x; return u.f; | |
#endif | |
} | |
size_t cnt1 = 0, cnt2 = 0; | |
static Vec2f | |
dist_circlef_zig(DistCirclefZig const *zig, uint64_t (*rand64)(void*), void *rng) | |
{ | |
float fx, fy; | |
uint64_t signs; | |
++cnt1; | |
do { | |
++cnt2; | |
uint64_t u64 = rand64(rng); | |
fx = (u64 >> (64 - 24)) * (1.0f / ((1u<<24)-1)); | |
fy = (u64 << 24 >> (64 - 24)) * (1.0f / ((1u<<24)-1)); | |
signs = u64 >> (64-24*2-2); | |
uint64_t idx = u64 & (DIST_CIRCLEF_ZIG_COUNT-1); | |
float vx = zig->tbl[idx*2+0]; | |
float vy = zig->tbl[idx*2+1]; | |
float ny = zig->tbl[idx*2+3]; | |
fx *= vx; /* scale to box */ | |
fy = fy * (ny - vy) + vy; | |
} while (fx*fx + fy*fy >= 1.0); /* reject */ | |
/* there usually is a sign injection instruction | |
* so this compiles down quite neat */ | |
Vec2f ret = { | |
copysignf(fx, u32tof(signs<<31)), | |
copysignf(fy, u32tof(signs>>1<<31)) | |
}; | |
return ret; | |
} | |
/* | |
* 3. Benchmarking ============================================================= | |
* | |
* 3.1 Alternative Implementations --------------------------------------------- | |
* | |
* One straight forward way to sample a random coordinate from a unit circle is | |
* by generating polar coordinates and converting them to Cartesian | |
* coordinates. | |
* This does however require computing the sine and cosine of an angle, which | |
* is a quite expensive operation. | |
*/ | |
#ifdef DIST_CIRCLE_BENCH | |
static Vec2f | |
dist_circlef_trig(uint64_t (*rand64)(void*), void *rng) | |
{ | |
uint64_t u64 = rand64(rng); | |
float f1 = (u64 >> (64 - 24)) * (1.0f / ((1u<<24)-1)); | |
float f2 = (u64 & 0xffffff) * (1.0f / ((1u<<24)-1)); | |
float r = sqrtf(f1); | |
float theta = f2 * 6.283185307179586; /* f2*2*pi */ | |
Vec2f ret = { r*cosf(theta), r*sinf(theta) }; | |
return ret; | |
} | |
/* | |
* The regular rejection method mentioned in the introduction suffers from a | |
* lot of branch miss predictions, because rejections are quite common, with a | |
* 21% probability. | |
* If we want to generate multiple random coordinates at a time, then we can get | |
* rid of the branch altogether and replace with a much less impactful | |
* register dependency. | |
*/ | |
static void | |
dist_circlef_rej_multi(uint64_t (*rand64)(void*), void *rng, Vec2f *ptr, size_t n) | |
{ | |
for (size_t i = 0; i < n; ) { | |
uint64_t u64 = rand64(rng); | |
float fx = (u64 >> (64 - 24)) * (1.0f / ((1u<<24)-1)); | |
float fy = (u64 & 0xffffff) * (1.0f / ((1u<<24)-1)); | |
fx = fx*2 - 1; | |
fy = fy*2 - 1; | |
ptr[i].x += fx; // Note: we accumulate to make sure it's not | |
ptr[i].y += fy; // optimized away, see benchmark code below | |
i += fx*fx + fy*fy < 1.0; | |
} | |
} | |
/* | |
* 3.2 Benchmark --------------------------------------------------------------- | |
* | |
* We accumulate the results into an array and sum it, to ensure that they | |
* don't get optimized away. | |
*/ | |
#define RANDOM_H_IMPLEMENTATION | |
#include <cauldron/random.h> | |
/* https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h */ | |
#define BENCH_DONT_NORMALIZE | |
#include <cauldron/bench.h> | |
/* https://github.com/camel-cdr/cauldron/blob/main/cauldron/bench.h */ | |
#include <stdio.h> | |
#include <stdlib.h> | |
#define N (1024*1024) | |
#define WARMUP 16 | |
#define REPEAT 64 | |
int | |
main(void) | |
{ | |
DistCirclefZig zig; | |
dist_circlef_zig_init(&zig); | |
PRNG64RomuQuad prng64; | |
prng64_romu_quad_randomize(&prng64); | |
Vec2f *arr = malloc(N * sizeof *arr); | |
memset(arr, 0, N * sizeof *arr); | |
BENCH("trig", WARMUP, REPEAT) { | |
for (size_t i = 0; i < N; ++i) { | |
Vec2f v = dist_circlef_trig(prng64_romu_quad, &prng64); | |
arr[i].x += v.x; arr[i].y += v.y; | |
} | |
BENCH_CLOBBER(); | |
} | |
BENCH("rej", WARMUP, REPEAT) { | |
for (size_t i = 0; i < N; ++i) { | |
Vec2f v = dist_circlef(prng64_romu_quad, &prng64); | |
arr[i].x += v.x; arr[i].y += v.y; | |
} | |
BENCH_CLOBBER(); | |
} | |
BENCH("zig", WARMUP, REPEAT) { | |
for (size_t i = 0; i < N; ++i) { | |
Vec2f v = dist_circlef_zig(&zig, prng64_romu_quad, &prng64); | |
arr[i].x += v.x; arr[i].y += v.y; | |
} | |
BENCH_CLOBBER(); | |
} | |
BENCH("rej_multi", WARMUP, REPEAT) { | |
dist_circlef_rej_multi(prng64_romu_quad, &prng64, arr, N); | |
BENCH_CLOBBER(); | |
} | |
double sum = 0; | |
for (size_t i = 0; i < N; ++i) | |
sum += arr[i].x - arr[i].y; | |
printf("sum: %f\n", sum); | |
printf("%zu, %zu\n", cnt1, cnt2); | |
bench_done(); | |
return 0; | |
} | |
#endif /* DIST_CIRCLE_BENCH */ | |
/* | |
* 3.3 Results ----------------------------------------------------------------- | |
* | |
* Run on Zen1 1600x: | |
* | |
* relative time trig rej zig rej_multi | |
* clang -O3 7.72 2.94 1.23 1.15 | |
* gcc -O3 6.84 2.95 1.32 1.54 | |
* clang -O3 -ffast-math 7.25 2.71 1.45 1.00 | |
* gcc -O3 -ffast-math 6.36 2.72 1.24 1.28 | |
* | |
* The ziggurat method is a clear improvement over both the simple rejection | |
* and trigonometric method. | |
* It's likely the best for latency sensitive and one off usage. | |
* If you need to generate multiple random coordinates uniformly distributed in | |
* a unit circle, then the regular rejection methods ends up faster, although | |
* there you might as well consider a SIMD implementation. | |
* | |
* For SIMD I'd recommend using a fast SIMD PRNG like SHISHUA <4>, convert the | |
* random bits to a uniform coordinates in the range [-1,1], reject with | |
* Pythagoras, and compress store the accepted values. | |
* | |
* | |
* 4 Testing =================================================================== | |
* | |
* For now this is only a visual test, you can compile the file with | |
* cc -xc -DDIST_CIRCLE_TEST -O3 dist_circle.h -lm | |
* and plot the data using | |
* ./a.out > /tmp/1; gnuplot -e 'plot "/tmp/1"' -p | |
* | |
*/ | |
#ifdef DIST_CIRCLE_TEST | |
#define RANDOM_H_IMPLEMENTATION | |
#include <cauldron/random.h> | |
/* https://github.com/camel-cdr/cauldron/blob/main/cauldron/random.h */ | |
#include <stdio.h> | |
int | |
main(void) | |
{ | |
DistCirclefZig zig; | |
dist_circlef_zig_init(&zig); | |
PRNG64RomuQuad prng64; | |
prng64_romu_quad_randomize(&prng64); | |
for (size_t i = 0; i < 10000; ++i) { | |
#if 1 | |
Vec2f v = dist_circlef_zig(&zig, prng64_romu_quad, &prng64); | |
#else | |
Vec2f v = dist_circlef(prng64_romu_quad, &prng64); | |
#endif | |
assert(v.x*v.x+v.y*v.y < 1); | |
printf("%f\t%f\n", v.x, v.y); | |
} | |
return 0; | |
} | |
#endif /* DIST_CIRCLE_TEST */ | |
/* | |
* References ================================================================== | |
* | |
* <1> "When Greedy Algorithms Can Be Faster", Benjamin Summerton (2025) | |
* URL: "https://16bpp.net/blog/post/when-greedy-algorithms-can-be-faster/" | |
* | |
* <2> "Ziggurat algorithm", Wikipedia | |
* URL: "https://en.wikipedia.org/wiki/Ziggurat_algorithm" | |
* | |
* <3> Marsaglia, George, and Wai Wan Tsang (2000): | |
* "The ziggurat method for generating random variables." | |
* Journal of statistical software 5.8: 1-7. | |
* | |
* <4> " SHISHUA – The fastest PRNG in the world", Thaddée Tyl | |
* URL: "https://github.com/espadrine/shishua" | |
*/ | |
/* | |
* Licensing =================================================================== | |
* | |
* MIT License ------------------------------------------------- | |
* | |
* Copyright (c) 2025 Olaf Berstein | |
* Permission is hereby granted, free of charge, to any person obtaining a copy | |
* of this software and associated documentation files (the "Software"), to deal | |
* in the Software without restriction, including without limitation the rights | |
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
* copies of the Software, and to permit persons to whom the Software is | |
* furnished to do so, subject to the following conditions: | |
* The above copyright notice and this permission notice shall be included in | |
* all copies or substantial portions of the Software. | |
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |
* SOFTWARE. | |
* | |
*/ |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment