By Leonard Ritter (25.9.2024, last update 6.1.2024)
NOIR is a high-level function-scope intermediate representation for programs, extending SSA Form but omitting basic blocks in favor of a simple acyclic dependency structure that is simpler to evaluate, analyze, canonicalize and optimize, particularly in the context of concurrent and iterative execution, and offers different lowering strategies.
In order to render a function-level IR fully functional, we only need five things: instructions, ordering, branches, merges and loops. In the following sections, we will introduce and describe each element.
As an introductory example and for overview, here is an iterative fibonacci procedure expressed in NOIR pseudocode:
proc(%n) -> %x0 {
%L = event() // loop(%x0, %x1, %i)
%x0 = var(0)
%x1 = var(1)
%i = var(%n)
%2 = %L ? ine(%i, 0) // i% != 0
%3 = case(%2, true) // if %2 then
%8 = %3 ? iadd(%x0, %x1) // %8 = %x0 + %x1
%5 = %3 ? isub(%i, 1) // %5 = %i - 1
%3 ? next(%x0, %x1) // repeat(%x1, %x0 + %x1, %i - 1)
%3 ? next(%x1, %8)
%3 ? next(%i, %5)
%3 ? cue(%L)
}
In its simplest form, the declaration of instructions remains unchanged, same as in static single assignment form:
%value [: T] = instruction(%operand1, %operand2, ...)
Is to be read as: "the name %value
(of type T
) is bound to the result of performing the operation instruction
with the arguments %operand1
, %operand2
etc."
We also permit the value of any instruction to evaluate to an undefined value when the instruction was unsuccessfully executed. This may be implemented by implicitly extending the type T
to an option type defined(T) | undefined
.
As instructions no longer reside in basic blocks, the order of execution is determined by a topological ordering of all instructions dependent on each other. Therefore, in addition to its active dependencies, a product of passive dependencies may be declared on an instruction for the purpose of ordering instructions explicitly. This is the first extension of SSA by NOIR, and a prerequisite for the support of branches:
%predecessor1, %predecessor2, ... ? instruction(%operand1, %operand2, ...)
Is to be read as: "After all %predecessor1
, %predecessor2
, ... have executed and none of them are undefined
, apply this instruction."
A negative passive dependence may also be specified using !
instead of ?
, inverting the meaning of the statement: "After all %predecessor1
, %predecessor2
, ... have executed and all of them are undefined
, apply this instruction."
The "none/all of them are undefined
" part of the passive dependency rule gives natural rise to conditional branching. Hence branches are not implemented by conditional jumps, as there are no blocks and hence no block-terminating instructions. Instead, a single instruction is introduced:
%result : defined(void) | undefined = case(%operand, <integer-constant1>, <integer-constant2>, ...)
Reads as: "assume that %operand
is equal to <integer-constant1>
or <integer-constant2>
etc. If the assumption holds, produce an empty value (void). Otherwise, %result
is undefined
".
The case
form was chosen because it is a superset of the binary conditional and the switch table. All instructions that belong in a case depend on the case passively; instructions that belong to the default case depend using %x !
, where e.g. %x
is a case instruction that verifies the operand is part of the switch table's set of constants.
To support Φ functions, the instruction phi(%a, %b, ...)
is made available, which either evaluates to its one and only defined argument or is undefined otherwise.
In addition, to facilitate support for merging conditionally non-exclusive branches, or(%a, %b, ...)
is introduced, which simply evaluates to the first defined argument in order of appearance, or is undefined otherwise. Where it can be proven that all arguments to a or
instruction are exclusively defined, it may be lowered to phi
.
To support looping flow, we finally introduce a set of instruction pairs. NOIR describes its loops with "update flow", a dataflow extension, rather than with control flow:
%L : defined(void) = event()
cue(%L)
%value1 : defined(typeof(%init)) | undefined = var(%init)
next(%value1, %new_value)
A function is conceptually evaluated in full iterations. The event
instruction produces a void value that can be used as passive dependency by instructions with side effects in order to force an update, and therefore re-execution of the instruction. cue
cues such an event update for the next iteration. Multiple cues for the same event are treated as if the event were cued only once. When an event has been cued (updated), all active and passive dependencies must be updated. All instructions that have not been updated maintain their previous value. The function ends when no events have been cued within the latest iteration.
To permit loops to maintain iterative state, the var
instruction is provided. It initially takes on the value of %init
¹, and may be updated by next
. Any next
encountered mutates the value of the var
it targets for the next iteration, without triggering an update. Each var
must have none or exactly one corresponding next
².
¹) Every update of %init
forces var
to take on its value, overriding any pending assignments.
²) The "exactly one next
" rule may be weakened to permit multiple next
targeting a single var
, which should be rewritten to a single next
sourcing an or
instruction that bundles all previously repeated operands in the reverse topological order they have appeared.
A work-in-progress NOIR compiler exists, written in Scopes, which serves as the primary IR for the next, self-hosting iteration of Scopes.
These concepts constitute the essence of a NOIR program.