Compiler Lab: Generating Assembly Code

Introduction

Assigned: Oct. 14
Due: Oct. 30, 11:59 PM

Note: Keshav Kini ([email protected]) is the lead TA for this lab assignment.

This lab is designed mainly to give you some experience in programming in C, and a bit of experience with x86-64 assembly. You will be writing a very basic compiler – in other words, a program that produces x86-64 assembly code when you run it.

Your program will take in a reverse Polish notation arithmetic expression and produce assembly code for a function that computes the value of that expression. This will be explained in detail later in this document.

These instructions are quite long. Read them carefully!

Logistics

This is an individual project. You may not share your work on lab assignments with other students, but do feel free to ask instructors for help if you are stuck, for example during office hours or a discussion section.

You will turn in a tarball (a .tar.gz file) containing the source code for your program, written in C. You must include a Makefile, such that we can build your code using the make command. (If you did Lab 0, you will already be familiar with writing makefiles.) Ensure that your code compiles and runs correctly on skipper.cs.utexas.edu (I’m just choosing a fixed machine for consistency).

Any clarifications and corrections for this lab will be posted on Canvas and Piazza, and if necessary, fixes will be committed to this file.

Background

Reverse Polish notation (RPN) is an alternative way of representing arithmetic expressions like those you might type into a calculator. At one point in time, it was popular to use RPN in physical calculators because it was easier to implement a calculator that read RPN. Similarly, it will be easier for you to do this assignment with RPN than with the normal style of writing arithmetic expressions.

Here are some examples to give you an idea of how RPN works.

Normal: 1 + 1
RPN: 1 1 +
Normal: 3 * (4 + 7)
RPN: 3 4 7 + *
Normal: (8 + 9) * (4 + 3 * 10) - 5
RPN: 8 9 + 4 3 10 * + * 5 -

Do you see how it works? In RPN, arithmetic operators like +, *, and - come after their two operands, rather than in between them. As you enter each number or operator, the RPN calculator proceeds with its computation. When you enter a number, the calculator stores the number on a stack. When you enter an operator, it pops the latest two numbers off of the stack, applies the operator to them, and pushes the result back onto the stack.

Notice how we didn’t need any parentheses to write expressions in RPN! Yes, you might think I cherry-picked examples where RPN didn’t need parentheses, but if you think about it for a while longer you’ll see that parentheses are never necessary in RPN! (Hint: it’s because all the operators take exactly 2 arguments.) That certainly makes it simpler to design the calculator, since you don’t have to keep track of nested parentheses.

Here is an example of how an RPN calculator would process that last example from above, as given by how the calculator’s stack looks after each new input.

You can read more about reverse Polish notation on Wikipedia.

The RPN compiler

We define in this section a simple compiler that takes an expression on the command line and dumps assembly code to stdout (standard output). The assembly code will be a function called compute that accepts as arguments three values for the variables x, y, and z respectively, and returns result of the expression.

Input to your program is an expression (sequence of tokens) given on the command line. This expression may contain integer constants, variables, and operators. Integer constants may be positive, zero, or negative, meaning that they will consist of numeric digits, possibly with a minus sign at the beginning. The only variables allowed are x, y, and z (because the assembly function your program will produce will be a function that accepts three arguments, x, y, and z). The following operators are allowed: +, *, - (plus, times, minus). Expressions are written in reverse Polish notation (as described earlier).

A couple of sample invocations of your program might be:

$ ./compiler x 4 + 2 "*" 4 -
[some assembly code is printed here]
$ ./compiler x 4 + 2 4 "*" -
[some assembly code is printed here]

Note: In most shells (including bash, the one you are probably using), the * character is used to mean “all the files in the current directory”, so unfortunately, the * needs to be put in quotation marks for our purposes, otherwise the shell will replace it with a list of all the files in the current directory.

If variable x has value 3, these expressions should equal 10 and -1, respectively. But if variable x has value 4, the expressions should equal 12 and 0, respectively.

Therefore, the assembly code produced by the first run of your compiler should be a function that returns the number 10 when called with argument x set to 3, and the number 12 when called with argument x set to 4. Meanwhile, the second piece of assembly code should be a function that returns -1 when 3 is passed as argument x, and returns 0 when 4 is passed as argument x.

The output of your program will be some x86-64 assembly code, printed to stdout (standard output). This code should contain a function named compute which accepts three arguments according to the standard x86-64 calling convention. The function should be designed such that it always returns the value of the input RPN expression where the values of x, y, and z have been substituted in. More on that below.

First, think about how you evaluate RPN. Given an RPN expression, do the following:

If you encounter a variable, push its value on the runtime stack.
If you encounter a constant, push it on the runtime stack.
If you encounter an operator, pop the right number of arguments, apply the operator, and push the result.
If done, the runtime stack should simply contain the answer, so retrieve the answer and return it.

Note that we are directly using “the stack” as our RPN calculation stack, rather than making a separate stack data structure or something.

Various problems might occur. For example,

You might encounter an unrecognizable symbol.
There may not be enough arguments on the stack for the operator you are attempting to apply.
There may be more than one value left on the stack when you are finished.

I’d suggest maintaining an integer variable to keep track of how many values would be on the stack at any point during your compilation / execution.

If you encounter any of these errors, print a useful error message to stderr, not stdout, and terminate with a nonzero exit code. Examples of illegal inputs are:

a (because the variable a is not allowed – only x, y, and z are allowed)
4 + (because + needs two arguments)
3 4 (because it will leave two values on the stack)
3 4 4 2 + (because it will leave three values on the stack)
+ (because + needs two arguments)

Aside: It’s important to print to stderr rather than stdout so that the user can separate the assembly code you’re trying to output from any error messages you might produce. For example, when you redirect stdout to a file using the special shell directive ”>”, stderr messages will still be printed to the screen. Sample invocations would look like this:

$ ./compiler 1 2 3 4 + + + > output1.s
$ ./compiler 1 2 3 4 "*" - 5 6 7 + + - "*" 8 9 "*" + > output2.s
$ ./compiler 2 "*" 2 > output3.s
Error: The * operator requires two arguments
$ ./compiler 2 2 "*" > output4.s

In all four invocations, any assembly instructions printed to stdout will be silently put in the specified files, which we are naming with the .s extension because that extension is used for assembly code files. But in the third invocation, there was an error message, and that didn’t get sent to the output3.s file, but rather to the screen, because ”>” only redirects stdout, not stderr.

Now, imagine parsing a legal RPN expression and generating an x86-64 subroutine that would evaluate it. Our expressions may contain only variables x, y, and z. We assume that these are passed (in order) to your program via the usual calling mechanism. That is, x, y, and z are passed in the registers %rdi, %rsi, and %rdx, respectively. You should return your answer in register %rax, which again is specified by the standard calling conventions.

Consider the following RPN expression:

x 4 + 2 * 4 -

A first cut at compiling this might be to generate the following sequence of x86-64 instructions (probably without the comments):

  .globl compute                # Note: this marks that the label
                                # "compute:" is a function name
                                # that can be used from other code.
compute:
  pushq %rdi                    # put x on the stack
  pushq $4                      # put 4 on the stack
  popq  %r10                    # . add the top two things
  popq  %r11                    # |
  addq  %r10, %r11              # |
  pushq %r11                    # '
  pushq $2                      # put 2 on the stack
  popq  %r10                    # . multiply the top two things
  popq  %r11                    # |
  imulq %r10, %r11              # |
  pushq %r11                    # '
  pushq $4                      # put 4 on the stack
  popq  %r10                    # . subtract the top two things
  popq  %r11                    # |
  subq  %r10, %r11              # |
  pushq %r11                    # '
  popq  %rax                    # prepare to return the answer
  retq                          # return from the function

But this is very inefficient. We are pushing numbers on the stack and immediately popping them off again into registers. Also, we’re popping numbers off the stack into registers before performing arithmetic operations on them. It’s true that in x86-64, we can’t write arithmetic instructions where both the source and destination are in memory. But sometimes we’ll want to operate on an integer constant and something on the stack, which can be done with some instructions (like add) and not others (like imul). So maybe we can optimize a bit.

Here’s a second attempt:

  .globl compute
compute:
  pushq %rdi                    # put x on the stack
  addq  $4, (%rsp)              # add 4 (directly in memory)
  popq  %r8                     # . multiply by 2 (using a temporary
  imulq $2, %r8                 # |  register)
  pushq %r8                     # '
  subq  $4, (%rsp)              # subtract 4 (directly in memory)
  popq  %rax                    # prepare to return the answer
  retq                          # return from the function

Notice that add and sub can target a memory location, but imul must target a register. Experiment with (or search the web for information about) different instructions to see which ones can target a memory location and which ones have to use registers.

You can use the as command to call the GNU assembler on your assembly code, to check if it is valid x86-64 assembly. Use the --64 flag to ensure you’re assembling as x86-64 as opposed to 32-bit x86 or any other kind of assembly code.

Now, there’s even more optimization we might be able to do. Notice how we happened to never put more than one thing on the stack at a time. So why don’t we just use a register for that one location on the stack we keep using? Then we don’t need to access memory at all. And in fact we can just use the %rax register, into which we were eventually going to pop that stack value anyway.

Here’s a third attempt:

  .globl compute
compute:
  movq  %rdi, %rax              # put x in %rax
  addq  $4, %rax                # add 4 to %rax
  imulq $2, %rax                # multiply %rax by 2
  subq  $4, %rax                # subtract 4 from %rax
  retq                          # return from the function

Note that we’ve started doing optimizations that are pretty specific to the exact expression we started out with, x 4 + 2 * 4 -. They might not work for all expressions. That means that if you want to implement these optimizations, your C code that generates this assembly will have to be smart and know when it can and can’t do certain optimizations.

For your compilation, do not use callee saved registers – in other words, only use the following registers:

%rax: for the return value and/or intermediate storage
%rsp: the stack pointer
%rdi, %rsi, and %rdx: the registers used for passing in the three arguments
%rcx, %r8, and %r9: the three unused argument-passing registers
%r10, and %r11: the other two caller-saved registers

You can reuse registers if it’s safe to do so, and you can push things onto the stack if you run out of registers at any point.

Your code should be able to handle arbitrary RPN expressions that follow our constraints on variables.

Testing your code

You don’t want to submit your code without testing it first, of course! In this section we’ll cover the various ways in which you can make sure your code is working correctly.

Your code should compile

This one is obvious. Make sure your C code doesn’t have any syntax errors, etc., and make sure that your Makefile is in order. You should be able to go to the directory where your code is, type make, and have an executable file called compiler be created in the directory. Test this on skipper.cs.utexas.edu as well – that machine has an environment similar to what we’ll use for grading.

Your program should produce valid assembly code

Make sure that when you run the compiler executable that you created, with a valid RPN expression passed in as a series of command line arguments, you see valid assembly code printed to the screen.

Try some of the example invocations mentioned earlier in this README, and make up some more of your own.

Beyond just eyeballing it, make sure that the output is valid assembly code by using the special shell directive ”>” (as mentioned earlier) to make your program direct its output into a .s file such as test.s, and then run the assembler on it to see if there are any syntax errors or other problems:

$ as --64 test.s

(As mentioned before, the --64 is to ensure that the assembly code is treated as x86-64 assembly code.)

The assembly code should provide the right function

Your program is required to produce assembly code for a function called compute which takes three 64-bit signed integer arguments and returns a 64-bit signed integer argument. In C, that means the function would have looked something like this:

#include <stdint.h> // this needs to be at the top of the file

int64_t compute (int64_t x, int64_t y, int64_t z) {
  // do stuff
  return something;
}

(In practice you’d often write long long rather than int64_t, and then you wouldn’t need the #include <stdint.h>, but unlike long long, the int64_t type is guaranteed to be exactly 64 bits long, no matter what compiler or architecture you use.)

But since you’re writing the function in assembly, if you wanted to use it in some C code, you would want to sort of “import” it. This is how you do it in C: first, somewhere in your C code, you make an extern declaration of the function, like so:

#include <stdint.h> // this needs to be at the top of the file

extern int64_t compute (int64_t x, int64_t y, int64_t z);

Notice that no function body is provided, only the argument types and return type and the name of the function, with the word extern (for “externally defined”). Then, when you compile your C code, just specify the name of the assembly file which actually contains the function, and gcc will hook it all up nicely. For example,

$ gcc -o full_program some_program.c compute_function.s

So, to test that your assembly code provides the correct function, write some random little C program which uses the compute function you defined (maybe it calls compute on some arguments and prints the result? or whatever), making sure to put in the extern definition shown above.

Then compile the program, while telling gcc the name of the file where you saved the assembly code output from your compiler. If there were no errors, gcc was able to read your assembly code as a definition of a compute function, as desired. (If you follow the format shown in the assembly code examples above, you shouldn’t have any problems.)

By the way, if you’re interested in how this linking process works, check out Chapter 7 in the textbook, which we may get to later in the semester.

The generated functions should work correctly

Try compiling a few RPN expressions of your own choosing. For each one, your program will produce some assembly code. Create a C program that will link with your assembly code (as described in the previous section) and test it by calling it with some particular values of x, y, and z and printing the result. Compile the test program together with the assembly code and run the resulting executable to see if it prints the number you expected.

Make sure to try various different kinds of RPN expressions, to make sure you did everything right. Try ones with many operators, ones with many numbers in a row, ones with negative numbers, ones with really large numbers, etc.

If you’re feeling really ambitious, you might try to write a randomizing test harness that automatically tests your compiler on many different randomly generated RPN expressions, and substitutes in many different randomly generated values of x, y, and z for each RPN expression!

Your submission should be in the correct format

Once you’ve finished programming, it’s time to submit your work. Add a README file explaining the optimization tricks you used to make your assembly code faster, if any, and then use the tar utility to create an archive (“tarball”) of the directory in which you’ve created your code.

Ideally the tarball should only contain source code, so you should get rid of any compiled files, generated assembly files, and other such cruft before making the tarball. But really, as long as we can type make after extracting your tarball and get a working compiled version of your code, that’s good enough. (Make sure to test that your tarball works on skipper.cs.utexas.edu, as mentioned earlier.)

Grading

Your assignment is to write a compiler program as described above.

The basic grading policy for this lab is that you’ll get 90 points if you create a program that correctly does what it is supposed to do – that is, accepts RPN expressions on the command line and dumps assembly code to stdout which defines a function called compute that takes arguments x, y, and z and returns the value of the RPN expression when those values are substituted in for the variables x, y, and z in the expression.

Even if you produce straightforward but inefficient assembly code such as that shown in the first example above, you will still earn the 90 points if the code behaves correctly.

The remaining 10 points will be assigned if you optimize the assembly code you generate in some significant way. Two such ways are as follows.

Avoid using the stack when you can help it, and instead use the various registers you have at your disposal, which are much faster than accessing memory.
Use constant folding to reduce the number of actual computations that would take place at runtime.
For example, if your compiler was given an RPN expression that was constant (i.e. didn’t involve any variables), then your compiler could just figure out the integer value of the expression and create an extremely fast, two-instruction function that just wrote the correct answer into %rax and returned, without doing any work!

Even for RPN expressions that do involve variables, maybe you could do partial constant folding – for example, simplify x 1 2 3 4 + + + + to x 10 +.

If you are feeling really ambitious, there are even more sophisticated forms of partial constant folding that you might try to explore, such as exploiting commutativity, factoring polynomial-like expressions, etc.

With your submission, you should include a README file explaining what optimization techniques your compiler uses.

Beyond the 90 + 10 breakdown given above, determination of further partial credit will be left up to the discretion of the grader.

kini/lab3.org