regehr/overflow.md

Last active April 13, 2016 22:56

Star (1) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/regehr/65252e90fb6bad82d533.js"></script>
Save regehr/65252e90fb6bad82d533 to your computer and use it in GitHub Desktop.

implementation sketch for faster integer overflow checking in clang/llvm

Raw

I want to speed up integer overflow checking for hardening purposes by keeping a sticky overflow flag and only trapping when necessary. I want to keep it super simple while hopefully giving the optimizers room to do their thing.

In the codegen part of clang:

each function gets an i1 for storing overflow information, initialized to 0
each integer overflow check ORs its result into the overflow flag
before each function call, return instruction, or other side-effecting operation, execude ud2 if overflow is set

Reasonable?

Author

regehr commented Mar 25, 2016

@nadavrot do branches not interfere with superscalar execution? I'm not good at this stuff.

chandlerc commented Mar 25, 2016

@nadavrot You can easily avoid register dependencies. But you're still going to be burning registers extracting and or-ing these things together.

@regehr Yes, branches do interfere with some things, but very minimally when they are perfectly predicted branches to ud2. The worst impacts I'm aware of are mild pressure on the branch prediction tables (but very mild) and blowing out the number of branches supported by the loop stream detector. These will hurt, but they will hurt much less than the alternatives.

nadavrot commented Mar 25, 2016

@regehr In theory there are limited resources in the branch prediction tables. However, my experience with Swift and JavaScript was that it was never a problem. I don't think that non-taken-overflow-branches are inserted into the branch prediction tables.

There is also the cost of fetching, decoding and executing the 'JO' instruction, but like @Chandler pointed out, the alternative is worse.

chandlerc commented Mar 25, 2016

@nadavrot I really liked a recent AMD branch predictor which they actually bothered to document -- from memory each branch would perfectly predict not-taken until taken, then predict taken until not-taken, and only then start using up the normal history tables. Really nice because branches like this (never taken) were reliably free in the prediction table.

But much like your experience with JS and Swift, I've never seen the prediction table's resources actually matter in practice. The closest are collisions in the table due to density of branches, and even that seems to happen much less these days. Yay for modern branch predictors, makes many things easier.

sanjoy commented Mar 25, 2016

This scheme (saturating has_overflowed bit) will also make
optimizing overflow checks away more difficult. I.e. in

i32 a = ...;
if ((a + 2) sign overflowed) set bit;
...
if ((a + 1) sign overflowed) set bit;

you won't be able to optimize away the second check just by doing
control flow analysis (you'll have to rely on the semantics of set bit), but in

i32 a = ...;
if ((a + 2) sign overflowed) abort();
...
if ((a + 1) sign overflowed) abort();

you can.

Author

regehr commented Mar 25, 2016

Ok, I'm convinced, thanks guys, I'll start reading the passes relevant for check removal.

regehr/overflow.md

regehr commented Mar 25, 2016

Uh oh!

chandlerc commented Mar 25, 2016

Uh oh!

nadavrot commented Mar 25, 2016

Uh oh!

chandlerc commented Mar 25, 2016

Uh oh!

sanjoy commented Mar 25, 2016

Uh oh!

regehr commented Mar 25, 2016

Uh oh!