amacgillivray/rust-sqrt-via-fast-inv-sqrt.md

FISR to find Sqrt(x)

...in rust

This is another fun/interesting toy problem based on a leetcode question.

In this case, the question asks for a function that can find the square root of an integer:

Given a non-negative integer x, return the square root of x rounded down to the nearest integer. The returned integer should be non-negative as well.

You must not use any built-in exponent function or operator.

For example, do not use pow(x, 0.5) in c++ or x ** 0.5 in python.

It's a relatively trivial problem. If you're a math nerd, you're probably already thinking of a binary search or Newton's Method. But, if you're a programming nerd, you're probably thinking "What about fast inverse square root?"

For those unfamiliar, FISR (Fast Inverse Square Root) is a legendary bit of code:

float Q_rsqrt( float number )
{
	long i;
	float x2, y;
	const float threehalfs = 1.5F;

	x2 = number * 0.5F;
	y  = number;
	i  = * ( long * ) &y;                       // evil floating point bit level hacking
	i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
	y  = * ( float * ) &i;
	y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//	y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

	return y;
}

In fact, it recently made the news when GitHub's CoPilot was shown to regurgitate it word-for-word (you may enjoy poking at the relevant Hacker News thread).

So, what is it? (If you want a good answer, read this).

Well, it's not the exact solution to the leetcode problem, but that's not going to dissuade us that easily. But let's get to the important part: It's a great method of making an initial guess at the inverse square root.

	y  = number;
	i  = * ( long * ) &y;                       // evil floating point bit level hacking
	i  = 0x5f3759df - ( i >> 1 );               // what the fuck?
	y  = * ( float * ) &i;

then applying Newton's Method to improve accuracy:

    y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//  y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

It works with floating-point values. The number it returns, y, is also a floating point, approximately equal to $1/\sqrt{number}$.

In our problem, we're given an integer, we want an exact solution, and we want that solution to be just $\sqrt{number}$. Let's take a stab at it, with some inspiration from Lukas Kalbertodt:

impl Solution {
    pub fn my_sqrt(f: i32) -> i32 {
        let x = f as f32;

        let i = x.to_bits();
        let i = 0x5f3759df - (i >> 1);
        let y = f32::from_bits(i);
        
        let z = y * (1.5 - 0.5 * x * y * y);

        (1.0/z).floor() as i32
    }
}

It works! Well, sort of. Unfortunately, if you submit this on LeetCode, you'll see that it's rather imprecise. For example, it finds the square-root of 2147395599 as 46351, when it should be 46339.

We don't have to give up yet - let's go back to Wikipedia. The "Magic Number" section shows an improvement by Jan Kadlec:

Jan Kadlec reduced the relative error by a further factor of 2.7 by adjusting the constants in the single Newton's method iteration as well,[33] arriving after an exhaustive search at:

conv.i = 0x5F1FFFF9 - ( conv.i >> 1 );

conv.f *= 0.703952253f * ( 2.38924456f - x * conv.f * conv.f );

return conv.f;

Unfortunately, the description is rather vague, and my google results for their methods ended on 404's written in Polish. Fortunately, a slightly broader snippet of his code is provided on the talk page:

float inv_sqrt(float x)
{ union { float f; uint32 u; } y = {x};
  y.u = 0x5F1FFFF9ul - (y.u >> 1);
  return 0.703952253f * y.f * (2.38924456f - x * y.f * y.f); }

Aha! Now I see what's going on. I think I was preoccupied by 'conv', until I realized it was a basic union.

We can turn that into rust, skipping the union to use safe behavior:

impl Solution {
    pub fn my_sqrt(f: i32) -> i32 {
        
        // Cast to float
        let x = f as f32;
        
        // Kadlec's numbers
        // see https://en.wikipedia.org/wiki/Fast_inverse_square_root#Magic_number
        let yu = x.to_bits();
        let yu = 0x5F1FFFF9 - (yu >> 1);
        let yf = f32::from_bits(yu);
        let z = 0.703952253 * yf * (2.38924456 - x * yf * yf);
	
	(1.0/z).floor() as i32
    }
}

Testing this on Leetcode with the default cases, we quickly see that it's not quite precise enough to use as a solution for all integers.

In addition to Jan's modified 1st iteration of Newton's method (let z = 0.703952253 * yf * (2.38924456 - x * yf * yf);), we'll need two more (regular) iterations to achieve decent precision:

impl Solution {

    pub fn my_sqrt(f: i32) -> i32 {

        let x = f as f32;
        let x2 = x * 0.5;
        let yu = x.to_bits();

        let yu = 0x5F1FFFF9 - (yu >> 1);
        let yf = f32::from_bits(yu);
        let mut z = 0.703952253 * yf * (2.38924456 - x * yf * yf);

        z = z * (1.5 - (0.5 * x * z * z));
        z = z * (1.5 - (0.5 * x * z * z));

	(1.0/z).floor() as i32
    }
}

Cool!

At this point, I was really hoping that we'd be done. However, we still aren't quite precise enough. There are three tests in LeetCode's 1017 test-cases where we overshoot by 1:

        match f {
            807810077 => 28421,  // (1.0/z).floor()-1 as i32
            1464898982 => 38273, // (1.0/z).floor()-1 as i32
            2147395599 => 46339, // (1.0/z).floor()-1 as i32
            _ => (1.0/z).floor() as i32
        }

So, sadly, we have to ruin the beauty of the prior solution and add a simple check. After we calculate the square root, we make sure that multiplying it by itself yields the input, f. If not, we can simply subtract 1:

impl Solution {
    pub fn my_sqrt(f: i32) -> i32 {
        
        // Cast to float and u32
        let x = f as f32;
        let yu = x.to_bits();
        
        // Kadlec's numbers / modified newton's method
        // see https://en.wikipedia.org/wiki/Fast_inverse_square_root#Magic_number
        let yu = 0x5F1FFFF9 - (yu >> 1);
        let yf = f32::from_bits(yu);
        let mut z = 0.703952253 * yf * (2.38924456 - x * yf * yf);

        // +2x newton's method; accuracy drops significantly
        // with any less
                z = z * (1.5 - (0.5 * x * z * z));
                z = z * (1.5 - (0.5 * x * z * z));

        // We may overshoot by 1
        // This will fix it
        let mut sqrt = (1.0/z).floor() as i32;
	
        if (sqrt*sqrt) > f { sqrt -= 1; }
    
        sqrt
    }
}

And voila! We now pass all test cases. The solution is, notably, $O(1)$, compared to the $O(log(n))$ solution (using Newton's method) in LeetCode's "Editorial" tab.

	impl Solution {
	pub fn my_sqrt(f: i32) -> i32 {
	let x = f as f32;

	let yu = x.to_bits();
	let yu = 0x5F1FFFF9 - (yu >> 1);
	let yf = f32::from_bits(yu);

	let mut z = 0.703952253 * yf * (2.38924456 - x * yf * yf);
	z = z * (1.5 - (0.5 * x * z * z));
	z = z * (1.5 - (0.5 * x * z * z));

	let mut sqrt = (1.0/z).floor() as i32;

	if (sqrt*sqrt) > f { sqrt -= 1; }

	sqrt
	}
	}