Skip to content

Instantly share code, notes, and snippets.

@orlp
Last active May 1, 2026 00:18
Show Gist options
  • Select an option

  • Save orlp/3551590 to your computer and use it in GitHub Desktop.

Select an option

Save orlp/3551590 to your computer and use it in GitHub Desktop.
int64_t ipow(int64_t base, uint8_t exp) {
static const uint8_t highest_bit_set[] = {
0, 1, 2, 2, 3, 3, 3, 3,
4, 4, 4, 4, 4, 4, 4, 4,
5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5,
6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 255, // anything past 63 is a guaranteed overflow with base > 1
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255,
};
int64_t result = 1;
switch (highest_bit_set[exp]) {
case 255: // we use 255 as an overflow marker and return 0 on overflow/underflow
if (base == 1) {
return 1;
}
if (base == -1) {
return 1 - 2 * (exp & 1);
}
return 0;
case 6:
if (exp & 1) result *= base;
exp >>= 1;
base *= base;
case 5:
if (exp & 1) result *= base;
exp >>= 1;
base *= base;
case 4:
if (exp & 1) result *= base;
exp >>= 1;
base *= base;
case 3:
if (exp & 1) result *= base;
exp >>= 1;
base *= base;
case 2:
if (exp & 1) result *= base;
exp >>= 1;
base *= base;
case 1:
if (exp & 1) result *= base;
default:
return result;
}
}
@woodb

woodb commented Nov 17, 2013

Copy link
Copy Markdown

That's some really pretty code, nice work.

@holtdoa

holtdoa commented Jan 5, 2016

Copy link
Copy Markdown

Pretty slick.

@Jabro

Jabro commented Aug 11, 2016

Copy link
Copy Markdown

Pretty cool!
There is one bug though, if you call it e.g. with base = 16 and exponent = 8, the int32_t base will overflow --> as 16^8 = 2^32 --> function will return 0 instead of 2^32

Either pass as int64_t or use a temporary variable with 64 bits

@prakharsingh95

prakharsingh95 commented Sep 5, 2016

Copy link
Copy Markdown

You can calculate highest set bit as i <= 64 ? (int)(log10(i)/log10(2)) : 255.

@mmdts

mmdts commented Apr 15, 2017

Copy link
Copy Markdown

Prakhasingh95, that would prove pointless. The point of this function is that it is so fast, that would just slow it down.

@KevinTyrrell

Copy link
Copy Markdown

A lot of repetitious code here. if (exp & 1) result *= base; is written six times, only needs to be written once. Also the array will be created and destroyed every function call which has overhead. You can move it outside the function header to prevent it from being continuously re-instantiated.

@ian-abbott

Copy link
Copy Markdown

@kevintyrell The code repetition is to optimize for speed. The array is static so will not be created and destroyed at every function call. The placement of the array within the function merely limits its static scope to that function.

@brdann

brdann commented Aug 4, 2018

Copy link
Copy Markdown

That's cool! But it computes unsigned int64_t and returns signed int64_t, what I'm missing?

@shengyang998

shengyang998 commented Nov 27, 2018

Copy link
Copy Markdown

@brdann Yeah, I think it should return signed.
Whatever it is, signed or unsigned, somehow the answer is wrong on my device.

// I got it. The int64_t doesn't work. Change to long long did solve it.

@WarpspeedSCP

Copy link
Copy Markdown

I feel like I'm missing something. There aren't any loops here. Does the function have to be called recursively?

@XMB5

XMB5 commented Feb 13, 2019

Copy link
Copy Markdown

The switch cases fall through

@alejandro-colomar

Copy link
Copy Markdown

You should check for overflow, because squaring base without any checks is asking for trouble. ipow(INT64_MAX, 2) invokes Undefined Behavior. ipow(3, 63) also invokes UB.

@alejandro-colomar

Copy link
Copy Markdown

For the code repetition I would make a static inline function, which avoids repetition, but keeps performance

@Clemapfel

Copy link
Copy Markdown

this can easily be made constexpr / consteval by moving highest_bit_set into the same namespace as the function and making that constexpr too which I recommend as it would be in the spirit of optimal runtime

@Zorgatone

Copy link
Copy Markdown

@orlp

Does this work with uint64_t as well?
Do I need to add elements to highest_bit_set past exponent 63?

@unclefunctor

Copy link
Copy Markdown

The optimization here is at a whole new level. Thanks for sharing and well done sir!

@xsbee

xsbee commented Jun 18, 2022

Copy link
Copy Markdown

Use switch fallthroughs.

@chemoelectric

chemoelectric commented Feb 5, 2023

Copy link
Copy Markdown

That is some dang clever loop unrolling.

@QVRE

QVRE commented Sep 20, 2023

Copy link
Copy Markdown

Wouldn't renaming 255 to 7 make this code easier to turn into an array lookup for the compiler ?

@MOJNICK

MOJNICK commented Jan 29, 2024

Copy link
Copy Markdown

@ian-abbott Have you tested performance for static and non-static "highest_bit_set version"?

@MOJNICK

MOJNICK commented Jan 29, 2024

Copy link
Copy Markdown

@Zorgatone yes you need to add, It will be equal to 7. And you need to add switch-case 7: . + Change signed parameters/values to unsigned. Or if you are sure the results you are aiming are <= int_64_max then of course you can use this function straight away ( in that case casting rules of C applies )

@ian-abbott

ian-abbott commented Jan 30, 2024

Copy link
Copy Markdown

@MOJNICK If highest_bit_set is non-static, the code initializes the array contents every time the function is called, so it is slower than the static version.

@nschaeff

nschaeff commented Mar 8, 2024

Copy link
Copy Markdown

the lookup table for highest bit set can be replaced with __builtin_clz if it is available (gcc, clang). It will make it faster (especially if the lookup table is not in cache, which is likely in a real-world application doing other things than that) and smaller.

Simply replace highest_bit_set[exp] with (exp>0) ? 32 - __builtin_clz(exp) : 0

@mogando668

Copy link
Copy Markdown

your code should end the cases sooner - when exp is down to { 0 , 1 , 2 }

Cuz from there it's very obvious what needs to be returned :::

   return { <= 0 := result
               1 := result * base
               2 := result * base * base  
          }

or more succinctly ::

  return ( base ** exp ) * result

Because when exp == 2 , you already know ahead of time that ( exp >>= 1 ) & 1 must be true, so why bother evaluating the obvious ?

Another speed up trick you can contemplate is that if you already know up-front that the exponent is 1 short of a power of 2, then you already know every single bit is a 1.

In that case, make it into a vanilla countdown loop for # of bits, and simply doing the exact same thing every round without bothering to either right shift or check for ( exp & 1 ). In my own code, for this scenario, I usually make the starting point slightly lagged, so each round I simply do a compound statement of

    result *= base *= base       # intentionally lagged, so one must be careful regarding when you 
                                 # need to perform the extra one to sync them back up with each other

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment