Symptom

In yklua's LLVM AOT IR, we see stuff like:

getelementptr i8, ptr %394, i64 -67108860

Which is surprising.

Where does -67108860 come from?

The first time that constant enters the pipeline is after the instcombine pass:

It converts this:

sw.bb2713:                                        ; preds = %if.end28
  %shr2714 = lshr i32 %call31, 7, !dbg !4354
  %sub2716 = sub nsw i32 %shr2714, 16777215, !dbg !4354
  %idx.ext2718 = sext i32 %sub2716 to i64, !dbg !4354
  %add.ptr2719 = getelementptr inbounds i32, ptr %incdec.ptr, i64 %idx.ext2718, !dbg !4357

to this:

sw.bb2713:                                        ; preds = %if.end28
  %shr2714 = lshr i32 %call31, 7, !dbg !4335
  %393 = zext nneg i32 %shr2714 to i64, !dbg !4338
  %394 = getelementptr i32, ptr %incdec.ptr, i64 %393, !dbg !4338
  %add.ptr2719 = getelementptr i8, ptr %394, i64 -67108860, !dbg !4338

This is an optimisation.

The before IR subtracts 16777215 from a pointer (via sub+gep).

The after chunk does the same using equivalent computations. It:

uses the gep to do the sub directly (using a -ve index).
works with a pointer to i8 (instead of i32 like before). getelementptr's units are measured in #elements, so it subs 16777215 * 4 = 67108860 (because i32 is 4x as big as i8).

(Note that the sub in the first chunk is not poison because there exist values of %call31 that wouldn't cause a signed overflow)

So this transform is fine: it eliminates a sub.

OK. So why then, do we subtract a large number (16777215) from a pointer in the first place?

At the top of lopcodes.h, we learn that constant signed operands are stored with a bias.

A signed argument is represented in excess K: the represented value is the written unsigned value minus K, where K is half the maximum for the corresponding unsigned argument.

Using the debug annotations, I tracked the source of this section of code to docondjump() in OP_EQI.

If you follow docondjump() through to where it decodes the constant operands for the jump, we have:

  #define GETARG_sJ(i)  \
      check_exp(checkopm(i, isJ), getarg(i, POS_sJ, SIZE_sJ) - OFFSET_sJ)

And if you print OFFSET_sJ, it's 16777215. We've seen this number before!

The unbiasing of a constant operand is being combined with the gep that finds the address of the next instruction to execute. It looks strange, but, I believe, is correct.

Are you sure?

I did a sanity check:

rolled LLVM back to the upstream version we last synced at.
built upstream lua (not yklua) and printed the IR for luaV_execute after every pass.
saw sub nsw i32 %shr3617, 16777215 all over the IR at early stages of the pipeline.

This suggests to me that what we are seeing is correct.

vext01/weird_offsets.md

Select an option