Decaf

P6: Decaf Register Allocation

Objective

The goal of our semester-long project is to gain experience in compiler implementation by constructing a simple compiler.

The goal of this part of the project is to implement a bottom-up register allocation algorithm, preparing ILOC for conversion to machine code.

Introduction

The semester-long project for this course is to build a compiler for the Decaf language. In this project, you will implement register allocation for Decaf, which will be the fifth phase in our compiler. You will do this using a simple bottom-up greedy allocation algorithm.

You should download the project starter files from the "Files" tab in Canvas. The starter files include many new .java files as well as .class files representing compiled solutions.

Before you begin writing code, you should spend some time reading through all of the infrastructure files provided in the starter files. Our compiler this semester is a large project. Your solution will need to interface with several other classes, and you will need to thoroughly read through those classes to understand their interactions. Also, there are utility functions that you may find useful.

If you have questions about the existing code, please ask them on the Piazza forum (see the link on the sidebar in Canvas).

For this project, you will be implementing MyLocalRegisterAllocator. The task of this class is to modify an ILOCProgram that uses an arbitrary number of virtual registers, restricting it to only use a certain number of registers (presumably matching the number of physical registers available on the target architecture).

Assignment

WARNING: You should only proceed with actual development once you are SURE you understand exactly what your task is and how your code should interact with the rest of the system.

For this particular project, you should pay special attention to the following classes:

  • ILOCFunction - This class has been extended to perform conversion to/from a control-flow graph (CFG) representation.
  • ILOCBasicBlock - Represents a basic block of ILOC instructions, which is the basic component of a CFG.
  • RenumberRegistersAndLabels - Re-numbers all registers and labels in an ILOC program for consistency; it also de-shares object instances, making latter modifications easier (e.g., to change a register number, just change the id field in the ILOCOperand).

Implement the MyLocalRegisterAllocator class. You should use a simple, bottom-up register allocation routine similar to the one described in EAC 13.3.2 to convert the original (virtual) register IDs to a fewer number of (physical) register IDs. For the Ensure and Allocate functions, the pseudocode in the book (Figure 13.1) is sufficient. For the main routine, you may find the following minor revision helpful:

# perform register allocation for the given block
allocateRegisters(block):

    for each instruction i in block:

        for each read vr in i:              # make sure operands are in registers
            pr = Ensure(vr)
            replace vr with pr in i
            class.Next[pr] = Dist(vr)
            if class.Next[pr] == INFINITY:
                class.Name[pr] = INVALID
                class.Next[pr] = -1
                class.Free[pr] = true
                push(class, pr)

        for each written vr in i:           # open up a register for the result
            pr = Allocate(vr)
            replace vr with pr in i
            class.Next[pr] = Dist(vr)

        if i is a CALL instruction:         # spill any live registers at function calls
            for each physical register pr:
                if class.Name[pr] is a virtual register:
                    store contents of pr
                    class.Name[pr] = INVALID
                    class.Next[pr] = -1
                    class.Free[pr] = true
                    push(pr)

This routine will work on a single basic block at a time. Normally, this would not yield a globally-valid allocation without taking into account liveness analysis. However, because we have taken pains to emit code in SSA form and to load/store variables every time they are used, no virtual registers will be live at the entrance and exit of each basic block. Thus, this local register allocation scheme works globally for our purposes.

As an additional benefit, we can now add a mechanism that finally will make recursive functions work properly. Previously, there was no way to avoid having recursive calls clobber the registers from concurrent invocations. Now, we can spill all live registers whenever we CALL a function. This is conservative and is not necessary in all cases but again it works nicely for our purposes, allowing us to use recursive functions without directly modifying the ILOC interpreter. The pseudocode above includes this new functionality.

Note that we only have a single register class that we need to worry about during this allocation (e.g., general-purpose integer registers), so there is no need to create a "Class" struct; just keep the data structures locally in MyLocalRegisterAllocator. I recommend just using arrays for most of the structures because they do not need to dynamically resize and the number of physical registers will be small. For register spilling, you will need to track the offsets of each spilled virtual register; I recommend using a map from virtual register ids to stack offsets.

The revised DecafCompiler for this project now calls the RenumberRegistersAndLabels routine to de-share ILOCOperand instances. This is important so that the register allocator can make changes to a single operand without immediately changing any other locations where that operand may be referenced. After that, it calls the register allocator, telling it to reduce the number of registers down to four (4). This number is perhaps a bit overly restrictive (most architectures have at least eight physical general-purpose registers), but it makes testing and debugging easier and it enables easier conversion to x86 and Y86 because we can assume that some of the registers are always available for hard-coded translation idioms.

Notes:

  • The ILOC generation code from the last project generated virtual register IDs starting at 1; however, for the purposes of this project it is more convenient to start at 0 so that they can be used as indices into data structures like name. Thus, reducing to four registers means reducing to r0-r3, as can be seen in the example below.
  • You will need to emit new instructions as you're performing allocation. In Java, this means you cannot use a for-each loop to iterate over the instructions in the each basic block; otherwise, you will get a ConcurrentModificationException.
  • The allocateRegisters function will run once for each basic block, so you must re-initialize any local data structures at the beginning of that function.

Complete Compiler

After you finish this project, you will have code that is (nearly) ready to run on a "real" machine. All that remains is to convert it to true machine code. Toward this end, I have include an x86 and a Y86 generator (GenerateX86.java and GenerateY86.java, repectively). These are fairly staightforward routines that convert each ILOC instruction into corresponding x86/Y86 instructions. I encourage you to look through them to see how this works.

To allow you to experiment with these machine code generators, I have also included the complete Decaf compiler driver for this project; the only change is that the P6 output ("--fdump-iloc-alloc" and "-i" are on by default). This adds a lot of new command-line options. You can run the compiler with no parameters to see the complete list, but here are the ones relevant to this project and to testing the machine code generators:

Standard output options:

    -r <N>          perform register allocation with N physical registers (must be >=2)
    -s              generate NASM x86 assembly (new file w/ extension .s)
    -y              generate Y86 assembly (new file w/ extension .ys)

Binary output options: (requires NASM/yas)

    --osx           link Mac OS X executable
                    (implies "-s", also generates .o and binary)
    --ubuntu        link Ubuntu executable
                    (implies "-s", also generates .o and binary)
    --minielf       link Mini-ELF executable
                    (implies "-y", also generates .yo and binary)

If you run the compiler with the "--minielf" option, it will generate a .ys Y86 assembly file and a compiled .o file, the latter of which should be a valid MiniELF/Y86 executable (this assumes that "yas" is installed, so you may want to do this on stu). Recall that you can execute this file with your CS 261 Y86 interpreter (or the reference interpreter) using the "-e" switch.

This "closes the loop" between CS 261 (your entry to the systems courses) and CS 432 (your advanced systems elective). Hopefully you have gained an appreciation for and a greater understanding of the complexity (and yet beauty!) of many aspects of traditional computer software systems. Along the way, you have built components for two large pieces of systems software (an interpreter and a compiler). Congratulations on completing this non-trivial accomplishment!

I hope that this opens the door to a lifetime of learning and improving your knowledge of the various computer systems that you interact with and use on a daily basis. It is important that we always have a wealth of individuals who understand not only how to use a system, but how the system works "under the hood." Good luck to you as you finish your time at JMU and begin your career!

Sample Input

def int add(int x, int y)
{
    return x + y;
}

def int main()
{
    int a;
    a = 3;
    return add(a, 2);
}

Sample Output

add:
  push bp                               // prologue
  i2i sp => bp
  addI sp, 0 => sp                      // allocate space for local variables (0 bytes)
  loadAI [bp+8] => r0 {x}
  loadAI [bp+12] => r1 {y}
  add r0, r1 => r0
  i2i r0 => ret
  jump l1                               // return (x+y)
l1:
  i2i bp => sp                          // epilogue
  pop bp
  return

main:
  push bp                               // prologue
  i2i sp => bp
  addI sp, -4 => sp                     // allocate space for local variables (4 bytes)
  loadI 3 => r0
  storeAI r0 => [bp-4] {a}              // a = 3
  loadAI [bp-4] => r0 {a}
  loadI 2 => r1
  push r1
  push r0
  call add
  addI sp, 8 => sp                      // de-allocate space for parameters (8 bytes)
  i2i ret => r0
  i2i r0 => ret
  jump l2                               // return add(a, 2)
l2:
  i2i bp => sp                          // epilogue
  pop bp
  return

Submission

Please submit your MyLocalRegisterAllocator.java file to the appropriate assignment on Canvas by the due date indicated there.

Code Reviews

Because this is the last project of the semester, there will be no code review.

Grading

Here is the grading breakdown for this project:

Automated tests90
Instructor review10
TOTAL100