This is another fun/interesting toy problem based on a leetcode question.
In this case, the question asks for a function that can find the square root of an integer:
Given a non-negative integer x, return the square root of x rounded down to the nearest integer. The returned integer should be non-negative as well.
You must not use any built-in exponent function or operator.
For example, do not use pow(x, 0.5) in c++ or x ** 0.5 in python.
It's a relatively trivial problem. If you're a math nerd, you're probably already thinking of a binary search or Newton's Method. But, if you're a programming nerd, you're probably thinking "What about fast inverse square root?"
For those unfamiliar, FISR (Fast Inverse Square Root) is a legendary bit of code:
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
In fact, it recently made the news when GitHub's CoPilot was shown to regurgitate it word-for-word (you may enjoy poking at the relevant Hacker News thread).
So, what is it? (If you want a good answer, read this).
Well, it's not the exact solution to the leetcode problem, but that's not going to dissuade us that easily. But let's get to the important part: It's a great method of making an initial guess at the inverse square root.
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
then applying Newton's Method to improve accuracy:
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
It works with floating-point values. The number it returns, y
, is also a floating point, approximately equal to
In our problem, we're given an integer, we want an exact solution, and we want that solution to be just
impl Solution {
pub fn my_sqrt(f: i32) -> i32 {
let x = f as f32;
let i = x.to_bits();
let i = 0x5f3759df - (i >> 1);
let y = f32::from_bits(i);
let z = y * (1.5 - 0.5 * x * y * y);
(1.0/z).floor() as i32
}
}
It works! Well, sort of. Unfortunately, if you submit this on LeetCode, you'll see that it's rather imprecise. For example, it finds the square-root of 2147395599
as 46351, when it should be 46339.
We don't have to give up yet - let's go back to Wikipedia. The "Magic Number" section shows an improvement by Jan Kadlec:
Jan Kadlec reduced the relative error by a further factor of 2.7 by adjusting the constants in the single Newton's method iteration as well,[33] arriving after an exhaustive search at:
conv.i = 0x5F1FFFF9 - ( conv.i >> 1 );
conv.f *= 0.703952253f * ( 2.38924456f - x * conv.f * conv.f );
return conv.f;
Unfortunately, the description is rather vague, and my google results for their methods ended on 404's written in Polish. Fortunately, a slightly broader snippet of his code is provided on the talk page:
float inv_sqrt(float x)
{ union { float f; uint32 u; } y = {x};
y.u = 0x5F1FFFF9ul - (y.u >> 1);
return 0.703952253f * y.f * (2.38924456f - x * y.f * y.f); }
Aha! Now I see what's going on. I think I was preoccupied by 'conv', until I realized it was a basic union.
We can turn that into rust, skipping the union to use safe behavior:
impl Solution {
pub fn my_sqrt(f: i32) -> i32 {
// Cast to float
let x = f as f32;
// Kadlec's numbers
// see https://en.wikipedia.org/wiki/Fast_inverse_square_root#Magic_number
let yu = x.to_bits();
let yu = 0x5F1FFFF9 - (yu >> 1);
let yf = f32::from_bits(yu);
let z = 0.703952253 * yf * (2.38924456 - x * yf * yf);
(1.0/z).floor() as i32
}
}
Testing this on Leetcode with the default cases, we quickly see that it's not quite precise enough to use as a solution for all integers.
In addition to Jan's modified 1st iteration of Newton's method (let z = 0.703952253 * yf * (2.38924456 - x * yf * yf);
), we'll need two more (regular) iterations to achieve decent precision:
impl Solution {
pub fn my_sqrt(f: i32) -> i32 {
let x = f as f32;
let x2 = x * 0.5;
let yu = x.to_bits();
let yu = 0x5F1FFFF9 - (yu >> 1);
let yf = f32::from_bits(yu);
let mut z = 0.703952253 * yf * (2.38924456 - x * yf * yf);
z = z * (1.5 - (0.5 * x * z * z));
z = z * (1.5 - (0.5 * x * z * z));
(1.0/z).floor() as i32
}
}
Cool!
At this point, I was really hoping that we'd be done. However, we still aren't quite precise enough. There are three tests in LeetCode's 1017 test-cases where we overshoot by 1:
match f {
807810077 => 28421, // (1.0/z).floor()-1 as i32
1464898982 => 38273, // (1.0/z).floor()-1 as i32
2147395599 => 46339, // (1.0/z).floor()-1 as i32
_ => (1.0/z).floor() as i32
}
So, sadly, we have to ruin the beauty of the prior solution and add a simple check. After we calculate the square root, we make sure that multiplying it by itself yields the input, f. If not, we can simply subtract 1:
impl Solution {
pub fn my_sqrt(f: i32) -> i32 {
// Cast to float and u32
let x = f as f32;
let yu = x.to_bits();
// Kadlec's numbers / modified newton's method
// see https://en.wikipedia.org/wiki/Fast_inverse_square_root#Magic_number
let yu = 0x5F1FFFF9 - (yu >> 1);
let yf = f32::from_bits(yu);
let mut z = 0.703952253 * yf * (2.38924456 - x * yf * yf);
// +2x newton's method; accuracy drops significantly
// with any less
z = z * (1.5 - (0.5 * x * z * z));
z = z * (1.5 - (0.5 * x * z * z));
// We may overshoot by 1
// This will fix it
let mut sqrt = (1.0/z).floor() as i32;
if (sqrt*sqrt) > f { sqrt -= 1; }
sqrt
}
}
And voila! We now pass all test cases. The solution is, notably,