Skip to content

Instantly share code, notes, and snippets.

@RealNeGate
Last active August 9, 2022 01:30
Show Gist options
  • Select an option

  • Save RealNeGate/67c762776f2805ac7722a9fee95fae8d to your computer and use it in GitHub Desktop.

Select an option

Save RealNeGate/67c762776f2805ac7722a9fee95fae8d to your computer and use it in GitHub Desktop.

return

Unrolling

This section will discuss the loop unroll decision making process (which loops to unroll, how much and why). Loop unrolling is just duplicating a loop body as to do "multiple iterations per iteration" for example:

size_t i = 0;
for (; i < N; i += 2) {
    work(i), work(i);
}

// fallback loop, for when N doesn't cleanly go into the unrolled
// loop, in this example the trip count of this second loop is 0-1
// so really it could be an if not a loop.
for (; i < N; i++) {
    work(i);
}

This is usually accomplished to allow for better instruction level parallelism (if you interleave the iterations you can potentially get the processor to run the instructions in parallel since they wouldn't be depending on each other in that scenario) or to avoid the overhead of jumping back to the top of the loop (this isn't as problematic anymore).

Case for 2x

Let's start by going through a simple pattern that advocates for a 2x unroll (twice iterations per iteration):

// fib
int c = a + b;
a = b;
b = c;

2x unrolled in SSA:
c0 = a0 + b0;
a1 = b0;
b1 = c0;
c2 = a1 + b1;
a2 = b1;
b2 = c1;

VVV

a2 = a0 + b0;
b2 = b0 + a2;

VVV

a += b
b += a

In this example we've simplified the swap-add (sound familiar 😏, it's what XADD though it's normally used for atomic fetch add not fibonacci) in the iteration of fibonacci such that it just does two fancy additions, let's break this down such that we can make a more generic pattern.

// original
int a = 0;
int b = 1;
for (int i = 0; i < n-1; i++) {
  int c = a + b;
  a = b;
  b = c;
}

return b;


// SSA-form
i1 := 0;
a1 := 0;
b1 := 1;

for (;;) {
  i2 := phi(i1, i3);
  if (i2 >= n-1) break;

  a2 := phi(a1, b2);
  b2 := phi(b1, c);
  c := a2 + b2;

  i3 = i2+1;
}

return phi(b1, b2)

You might notice that a2 depends on b2 which exists in the same basic block and indirectly depends on a2 through c, this is a text book case of a swap. We can generate a pattern from it that looks like this:

  a' := phi(a, b');
  b' := phi(b, c);
  c  := OPERATION(..., a', ..., b', ...);

in this example there's no restrictions on what OPERATION may be just that it depends on both a' and b', now you may be asking why we care about swaps and it's really simple it's because swaps cancel each other out so if we unroll twice we can kill the swaps and the SSA can statically understand the bindings once more.

swap(a, b) // a is b, b is a
b += a     // so this is technically a += b
swap(a, b) // we flip back
b += a     // and this is then b += a

as proven earlier in SSA

if we were writing a proper optimization pass for this we'd have some heuristics on the size of OPERATION but other than that, this can make for one good recipe for the loop unroller.

TODO: probably explain what a good size heuristic looks like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment