NOIR: A Neatly Optimal Intermediate Representation

By Leonard Ritter (25.9.2024, last update 6.1.2024)

NOIR is a high-level function-scope intermediate representation for programs, extending SSA Form but omitting basic blocks in favor of a simple acyclic dependency structure that is simpler to evaluate, analyze, canonicalize and optimize, particularly in the context of concurrent and iterative execution, and offers different lowering strategies.

In order to render a function-level IR fully functional, we only need five things: instructions, ordering, branches, merges and loops. In the following sections, we will introduce and describe each element.

As an introductory example and for overview, here is an iterative fibonacci procedure expressed in NOIR pseudocode:

proc(%n) -> %x0 {
    %L = event()                // loop(%x0, %x1, %i)
    %x0 = var(0)
    %x1 = var(1)
    %i = var(%n)
    %2 = %L ? ine(%i, 0)        // i% != 0
    %3 = case(%2, true)         // if %2 then
    %8 = %3 ? iadd(%x0, %x1)    //      %8 = %x0 + %x1
    %5 = %3 ? isub(%i, 1)       //      %5 = %i - 1
    %3 ? next(%x0, %x1)         //      repeat(%x1, %x0 + %x1, %i - 1)
    %3 ? next(%x1, %8)
    %3 ? next(%i, %5)
    %3 ? cue(%L)
}

Instructions

In its simplest form, the declaration of instructions remains unchanged, same as in static single assignment form:

%value [: T] = instruction(%operand1, %operand2, ...)

Is to be read as: "the name %value (of type T) is bound to the result of performing the operation instruction with the arguments %operand1, %operand2 etc."

We also permit the value of any instruction to evaluate to an undefined value when the instruction was unsuccessfully executed. This may be implemented by implicitly extending the type T to an option type defined(T) | undefined.

Ordering

As instructions no longer reside in basic blocks, the order of execution is determined by a topological ordering of all instructions dependent on each other. Therefore, in addition to its active dependencies, a product of passive dependencies may be declared on an instruction for the purpose of ordering instructions explicitly. This is the first extension of SSA by NOIR, and a prerequisite for the support of branches:

%predecessor1, %predecessor2, ... ? instruction(%operand1, %operand2, ...)

Is to be read as: "After all %predecessor1, %predecessor2, ... have executed and none of them are undefined, apply this instruction."

A negative passive dependence may also be specified using ! instead of ?, inverting the meaning of the statement: "After all %predecessor1, %predecessor2, ... have executed and all of them are undefined, apply this instruction."

Branches

The "none/all of them are undefined" part of the passive dependency rule gives natural rise to conditional branching. Hence branches are not implemented by conditional jumps, as there are no blocks and hence no block-terminating instructions. Instead, a single instruction is introduced:

%result : defined(void) | undefined = case(%operand, <integer-constant1>, <integer-constant2>, ...)

Reads as: "assume that %operand is equal to <integer-constant1> or <integer-constant2> etc. If the assumption holds, produce an empty value (void). Otherwise, %result is undefined".

The case form was chosen because it is a superset of the binary conditional and the switch table. All instructions that belong in a case depend on the case passively; instructions that belong to the default case depend using %x !, where e.g. %x is a case instruction that verifies the operand is part of the switch table's set of constants.

Merges

To support Φ functions, the instruction phi(%a, %b, ...) is made available, which either evaluates to its one and only defined argument or is undefined otherwise.

In addition, to facilitate support for merging conditionally non-exclusive branches, or(%a, %b, ...) is introduced, which simply evaluates to the first defined argument in order of appearance, or is undefined otherwise. Where it can be proven that all arguments to a or instruction are exclusively defined, it may be lowered to phi.

Loops

To support looping flow, we finally introduce a set of instruction pairs. NOIR describes its loops with "update flow", a dataflow extension, rather than with control flow:

%L : defined(void) = event()
cue(%L)

%value1 : defined(typeof(%init)) | undefined = var(%init)
next(%value1, %new_value)

A function is conceptually evaluated in full iterations. The event instruction produces a void value that can be used as passive dependency by instructions with side effects in order to force an update, and therefore re-execution of the instruction. cue cues such an event update for the next iteration. Multiple cues for the same event are treated as if the event were cued only once. When an event has been cued (updated), all active and passive dependencies must be updated. All instructions that have not been updated maintain their previous value. The function ends when no events have been cued within the latest iteration.

To permit loops to maintain iterative state, the var instruction is provided. It initially takes on the value of %init¹, and may be updated by next. Any next encountered mutates the value of the var it targets for the next iteration, without triggering an update. Each var must have none or exactly one corresponding next².

¹) Every update of %init forces var to take on its value, overriding any pending assignments.

²) The "exactly one next" rule may be weakened to permit multiple next targeting a single var, which should be rewritten to a single next sourcing an or instruction that bundles all previously repeated operands in the reverse topological order they have appeared.

Implementation

A work-in-progress NOIR compiler exists, written in Scopes, which serves as the primary IR for the next, self-hosting iteration of Scopes.

Conclusion

These concepts constitute the essence of a NOIR program.

paniq/NOIR.md