|
# Three Address Codes |
|
|
|
A three address code is an assembly like language where each instruction takes at most |
|
three arguments, for example: |
|
ADD $1 <- y, 5 |
|
is an instruction which takes variables y and constant 5, adds them, and stores the |
|
result in to a variable $1. From this point on, I'm going to denote variables with a $ |
|
followed by an identifier e.g. $1, $x23, etc. |
|
|
|
A lot of our binary operations follow the same pattern: OP result <- arg1 arg2. |
|
But what about other types of instructions, such as loading and storing to memory / |
|
arrays? Well, for those we have load and store instructions: |
|
LOAD result <- base offset // result = base[offset] |
|
STORE base offset <- result // base[offset] = result |
|
|
|
These instructions still follow our convention of three addresses. Sometimes we have |
|
instructions which use fewer than three addresses, such as a MOVE instruction: |
|
MOVE result <- source |
|
|
|
To use as a running example, lets write a small little program: |
|
int A[] = { 1, 2, 3, 4, 5 }; |
|
int avg = 0; |
|
int min = A[0]; |
|
for(int x = 0; x < length(A); x += 1) { |
|
avg = avg + A[x]; |
|
min = MIN(min, A[x]); |
|
} |
|
avg = avg / length(A) |
|
return avg; |
|
|
|
|
|
and then translate it in to a three address code: |
|
.data A { 1, 2, 3, 4, 5 } |
|
MOVE $avg <- 0 |
|
LOAD $min <- A 0 |
|
|
|
MOVE $x <- 0 // loop initialization |
|
start: |
|
LESS_THAN $t0 <- $x 5 // check loop condition |
|
IF_FALSE end <- $t0 |
|
|
|
LOAD $t1 <- A $x // loop body |
|
ADD $avg <- $avg $t1 |
|
MIN $min <- $min $t1 |
|
|
|
ADD $x <- $x 1 // loop effect |
|
JUMP start |
|
end: |
|
DIV $avg <- $avg 5 |
|
RET $avg |
|
|
|
|
|
A few comments before we move on. Fixed tables are encoded directly in to the program, |
|
and so its natural to have some way of describing that to our three address code. That |
|
is where the .data directive comes in to play. It doesn't need to be limited to three |
|
addresses because its not an instruction - its more just telling us to dump these values |
|
in to memory somewhere. Because A is known at compile time, we can treat length(A) as |
|
a constant in our compiler. |
|
|
|
Instruction names are not usually so verbose, but for our sake here I used the full names. |
|
I also used a different form of three address code to emphasize that it really is a lower |
|
level language that you are generating. |
|
|
|
|
|
|
|
# Static Single Assignment |
|
|
|
In order to talk about SSA, we first need to talk about basic blocks. A basic block is |
|
a chunk of code which has a single entry and a single exit. If we look at the above |
|
example, we have 4 basic blocks: |
|
-------- (1) |
|
MOVE $avg <- 0 |
|
LOAD $min <- A 0 |
|
|
|
MOVE $x <- 0 // loop initialization |
|
-------- (2) |
|
start: |
|
LESS_THAN $t0 <- $x 5 // check loop condition |
|
IF_FALSE end <- $t0 |
|
-------- (3) |
|
LOAD $t1 <- A $x // loop body |
|
ADD $avg <- $avg $t1 |
|
MIN $min <- $min $t1 |
|
|
|
ADD $x <- $x 1 // loop effect |
|
JUMP start |
|
-------- (4) |
|
end: |
|
DIV $avg <- $avg 5 |
|
RET $avg |
|
|
|
Knowing that each block has a single entry and exit, we can now reason about each basic |
|
block individually in a powerful way. |
|
|
|
SSA is a modification to our three address code which states that there can only be exactly |
|
one assignment to each variable. This is done by indexing each assignment, so the first |
|
assignment to $x would be $x_1, the second would be $x_2, and so on. Making the necessary |
|
modifications: |
|
-------- (1) |
|
MOVE $avg_1 <- 0 |
|
LOAD $min_1 <- A 0 |
|
|
|
MOVE $x_1 <- 0 // loop initialization |
|
-------- (2) |
|
start: |
|
LESS_THAN $t0_1 <- $x 5 // check loop condition |
|
IF_FALSE end <- $t0_1 |
|
-------- (3) |
|
LOAD $t1_1 <- A $x // loop body |
|
ADD $avg_2 <- $avg $t1_1 |
|
MIN $min_2 <- $min $t1_1 |
|
|
|
ADD $x_2 <- $x 1 // loop effect |
|
JUMP start |
|
-------- (4) |
|
end: |
|
DIV $avg_3 <- $avg 5 |
|
RET $avg_3 |
|
|
|
Now you'll note that this example is incomplete. There are still unsubscripted references to |
|
$x, $avg, and $min. This is because there are now multiple possible choices for each of these. |
|
In order to resolve these, we introduce PHI nodes. |
|
|
|
A PHI node indicates that we need to make a decision about which subscripted version to use. |
|
It is not an instruction; there is no (serious, non-theoretical) architecture out there which |
|
has a PHI instruction. It is simply a tool we will use in our intermediate representation and |
|
we will remove it when it comes time to output assembly code. |
|
|
|
Adding PHI nodes to our example: |
|
-------- (1) |
|
MOVE $avg_1 <- 0 |
|
LOAD $min_1 <- A 0 |
|
|
|
MOVE $x_1 <- 0 // loop initialization |
|
-------- (2) |
|
start: |
|
PHI $x_3 <- $x_1 $x_2 |
|
LESS_THAN $t0_1 <- $x_3 5 // check loop condition |
|
IF_FALSE end <- $t0_1 |
|
-------- (3) |
|
PHI $avg_4 <- $avg_1 $avg_2 |
|
PHI $min_3 <- $min_1 $min_2 |
|
LOAD $t1_1 <- A $x_3 // loop body |
|
ADD $avg_2 <- $avg_4 $t1_1 |
|
MIN $min_2 <- $min_3 $t1 |
|
|
|
ADD $x_2 <- $x_3 1 // loop effect |
|
JUMP start |
|
-------- (4) |
|
end: |
|
PHI $avg_5 <- $avg_1 $avg_2 |
|
PHI $min_3 <- $min_1 $min_2 |
|
DIV $avg_3 <- $avg_5 5 |
|
RET $avg_3 |
|
|
|
So why do we care? The two most obvious answers are constant propegation and dead code elimination. |
|
Constant propegation isn't shown here, but dead code elimination is. Lets start at the end of our |
|
program, and keep a list of things that are requested. |
|
* Starting with our return, we request $avg_3. |
|
* Since $avg_3 is assigned exactly once, we add the $avg_3 arguments to our list as well, which are |
|
$avg_5 and 5. |
|
* The constant 5 doesn't have any arguments. The $avg_5 arguments are $avg_1 and $avg_2 |
|
* ... |
|
Assume we repeat this procedure until we have marked all of the reachable assignments. I also assume |
|
that any jumps are always marked (to preserve the flow of the program). The result is: |
|
-------- (1) |
|
* MOVE $avg_1 <- 0 |
|
LOAD $min_1 <- A 0 |
|
|
|
* MOVE $x_1 <- 0 // loop initialization |
|
-------- (2) |
|
start: |
|
* PHI $x_3 <- $x_1 $x_2 |
|
* LESS_THAN $t0_1 <- $x_3 5 // check loop condition |
|
* IF_FALSE end <- $t0_1 |
|
-------- (3) |
|
* PHI $avg_4 <- $avg_1 $avg_2 |
|
PHI $min_3 <- $min_1 $min_2 |
|
* LOAD $t1_1 <- A $x_3 // loop body |
|
* ADD $avg_2 <- $avg_4 $t1_1 |
|
MIN $min_2 <- $min_3 $t1 |
|
|
|
* ADD $x_2 <- $x_3 1 // loop effect |
|
* JUMP start |
|
-------- (4) |
|
end: |
|
* PHI $avg_5 <- $avg_1 $avg_2 |
|
PHI $min_3 <- $min_1 $min_2 |
|
* DIV $avg_3 <- $avg_5 5 |
|
* RET $avg_3 |
|
|
|
Since all the $min assigments are never requested, they all amount to dead code and can be removed. |
|
Could this be done without SSA? Sure. But SSA is a strong assumption which simplifies a huge class of |
|
different optimizations. |
|
|
|
|
|
|
|
# Assembly |
|
|
|
Now, I won't explain how to turn SSA in to assembly right now, but I did want to show what our example |
|
program would look like if it was written in MIPS assembly: |
|
.text |
|
.globl main |
|
main: |
|
la $t0, A // $t0 = A |
|
li $t1, 0 // $t1 = avg |
|
lw $t2, 0($t0) // $t2 = min |
|
|
|
li $t3, 0 // $t3 = x |
|
start: |
|
slti $t4, $t3, 5 // x < length(A) |
|
beq $t4, $zero, end |
|
|
|
lw $t5, 0($t0) // $t5 = A[x] |
|
add $t1, $t1, $t5 // avg = avg + A[x] |
|
min $t2, $t2, $t5 // min = MIN(min, A[x]) |
|
|
|
addi $t0, $t0, 4 // x = x + 1 |
|
addi $t3, $t3, 1 |
|
j start |
|
end: |
|
li $t6, 5 |
|
div $a0, $t1, $t6 // avg = avg / length(A) |
|
li $v0, 1 // specify "print int" syscall |
|
syscall // print $a0 |
|
|
|
.data |
|
A: .word 1 .word 2 .word 3 .word 4 .word 5 |
|
|
|
And there we have it. Almost. I glossed over the fact that MIPS doesn't actually have a 'MIN' |
|
instruction, but I didn't want to make the examples even more complex. |
@CMCDragonkai @hosewiejacke
It strikes me that this would be an example of where converting from SSA back to machine code is more complex than this explanation eludes to:
Note: this example comes from a lecture by Hugh Leather.