Created
November 25, 2020 14:48
-
-
Save micycle1/5349fe1f5eab897cd8e8e10b569360cd to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
http://stereopsis.com/2div.html | |
Ben and 2 Divides | |
Two years ago, Ben Weiss (the guy who wrote Kai's Power Tools, Goo, Soap, etc.) was telling me about this idea he had to combine two divides. I just didn't pay attention. | |
Luckily, he told me again. | |
Ben's idea is really simple, and I use it everywhere now. | |
2 divides at once | |
Suppose you want to compute: | |
a/b AND c/d | |
In the real number system, this is the same as: | |
ad/bd AND cb/bd | |
In case this isn't getting through, that's: | |
inv := 1 / (b * d) | |
inv * a * d AND inv * b * d | |
Multiply instead | |
Of course, modern processors are LOTS faster at multiplies than divides. In fact, in optimal conditions, most of them can do a multiply per cycle, and a divide every 20-40 cycles. So an extra 5 multiplies spread over 2 divides should be worth it, right? | |
Well, I did some speed tests: | |
// This snippet computes c <- a / b, component-wise | |
for (int j = 0; j < n; j += 2) { | |
float ibd = 1.0f / (b[j] * b[j+1]); | |
c[j] = a[j] * b[j+1] * ibd; | |
c[j+1] = a[j+1] * b[j] * ibd; | |
} | |
If you put your P2's FPU in single-precision mode, this code runs in a blazing 9 CYCLES per divide (compared to 17 per divide normally), and you only drop about 1.5 bits in accuracy. If you're in double-precision, you can extend the technique to 3 divides per loop, and you still get even faster. | |
I've used this in texture mappers, perspective projection code (vertex transform, for example), and it's just really a wonder. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment