Skip to content

Instantly share code, notes, and snippets.

@regehr
Last active April 13, 2016 22:56
Show Gist options
  • Save regehr/65252e90fb6bad82d533 to your computer and use it in GitHub Desktop.
Save regehr/65252e90fb6bad82d533 to your computer and use it in GitHub Desktop.
implementation sketch for faster integer overflow checking in clang/llvm

I want to speed up integer overflow checking for hardening purposes by keeping a sticky overflow flag and only trapping when necessary. I want to keep it super simple while hopefully giving the optimizers room to do their thing.

In the codegen part of clang:

  • each function gets an i1 for storing overflow information, initialized to 0
  • each integer overflow check ORs its result into the overflow flag
  • before each function call, return instruction, or other side-effecting operation, execude ud2 if overflow is set

Reasonable?

@regehr
Copy link
Author

regehr commented Mar 25, 2016

@nadavrot do branches not interfere with superscalar execution? I'm not good at this stuff.

@chandlerc
Copy link

@nadavrot You can easily avoid register dependencies. But you're still going to be burning registers extracting and or-ing these things together.

@regehr Yes, branches do interfere with some things, but very minimally when they are perfectly predicted branches to ud2. The worst impacts I'm aware of are mild pressure on the branch prediction tables (but very mild) and blowing out the number of branches supported by the loop stream detector. These will hurt, but they will hurt much less than the alternatives.

@nadavrot
Copy link

@regehr In theory there are limited resources in the branch prediction tables. However, my experience with Swift and JavaScript was that it was never a problem. I don't think that non-taken-overflow-branches are inserted into the branch prediction tables.

There is also the cost of fetching, decoding and executing the 'JO' instruction, but like @Chandler pointed out, the alternative is worse.

@chandlerc
Copy link

@nadavrot I really liked a recent AMD branch predictor which they actually bothered to document -- from memory each branch would perfectly predict not-taken until taken, then predict taken until not-taken, and only then start using up the normal history tables. Really nice because branches like this (never taken) were reliably free in the prediction table.

But much like your experience with JS and Swift, I've never seen the prediction table's resources actually matter in practice. The closest are collisions in the table due to density of branches, and even that seems to happen much less these days. Yay for modern branch predictors, makes many things easier.

@sanjoy
Copy link

sanjoy commented Mar 25, 2016

This scheme (saturating has_overflowed bit) will also make
optimizing overflow checks away more difficult. I.e. in

i32 a = ...;
if ((a + 2) sign overflowed) set bit;
...
if ((a + 1) sign overflowed) set bit;

you won't be able to optimize away the second check just by doing
control flow analysis (you'll have to rely on the semantics of set bit), but in

i32 a = ...;
if ((a + 2) sign overflowed) abort();
...
if ((a + 1) sign overflowed) abort();

you can.

@regehr
Copy link
Author

regehr commented Mar 25, 2016

Ok, I'm convinced, thanks guys, I'll start reading the passes relevant for check removal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment