Decaf

P5: Decaf Register Allocation

Objective

The goal of our semester-long project is to gain experience in compiler implementation by constructing a simple compiler.

The goal of this part of the project is to implement a bottom-up register allocation algorithm, preparing ILOC for conversion to machine code.

Introduction

The semester-long project for this course is to build a compiler for the Decaf language. In this project, you will implement register allocation for Decaf, which will be the fifth phase in our compiler. You will do this using a simple bottom-up greedy allocation algorithm.

The starter files are available on stu at the following path:

/cs/students/cs432/f20/p5-regalloc.tar.gz

Before you begin writing code, you should spend some time reading through all of the infrastructure files provided in the starter files. Our compiler this semester is a large project. Your solution will need to interface with several other classes, and you will need to thoroughly read through those classes to understand their interactions. Also, there are utility functions that you may find useful.

If you have questions about the existing code, please ask them on the Piazza forum (see the link on the sidebar in Canvas).

For this project, you will be implementing a routine that performs analysis on an InsnList that uses an arbitrary number of virtual registers, transforming it to only use a certain number of registers (presumably matching the number of physical registers available on the target architecture).

Assignment

WARNING: You should only proceed with actual development once you are SURE you understand exactly what your task is and how your code should interact with the rest of the system.

Implement code generation by providing an implementation for the following function:

void allocate_registers (InsnList* list, int num_physical_registers);

For this particular project, you should pay special attention to the following files:

  • src/main.c - The overall driver for the compiler. At this point it implements all phases of compilation. You may change this file if it is useful for testing, but your submission should work with the original version.
  • src/p5-regalloc.c - The code generation implementation. Your implementation of allocate_registers will go here, and you may add helper functions here if you find them useful (but note that they will be inaccessible to other modules--you should NOT modify p5-regalloc.h).
  • src/y86.c - Converts ILOC code into Y86 assembly. See the "Complete Compiler" section below for more details.
  • tests - Testing framework folder. These files use the same general layout as described in the CS 261 project guide, so you may wish to review that document. UNLIKE IN CS 261, YOU WILL NOT BE PROVIDED ALL TEST CASES, only a few sanity test checks to help you estimate your progress in completing the project. You should significantly extend the provided test suite and convince yourself that your solution completely implements this phase of the compiler as described by the language reference and the description provided here.

You should use a simple, bottom-up register allocation routine similar to the one described in EAC 13.3.2 to convert the original (virtual) register IDs to a fewer number of (physical) register IDs. Your routine should work at the basic block level. I recommend using a primary loop that iterates over the instructions and identifies leaders (e.g., jump or call targets). This will allow you to delegate to a helper function that handles each block in turn, easing the process of data structure initialization for each block.

Having a loop that identifies leaders will also allow you to more easily identify the local stack allocator instruction (i.e., the ADD_I instruction that is the third instruction in each function) so that the stack frame size can be adjusted when a new value is spilled. I have provided several helper functions to assist with inserting load and store instructions for the spills.

Here is an alternative version of the psuedocode that you may find easier to use than the textbook's version:

// main physical/virtual mapping (INVALID indicates empty register)
name : physical register id => virtual register id

// tracks stack offset for spilled virtual registers
offset : virtual register id => int

allocate_registers(block):
    for each instruction i in block:

        for each read vr in i:
            pr = ensure(vr)                     // make sure vr is in a phys reg
            replace vr with pr in i             // change register id

            if dist(vr) == INFINITY:            // if no future use
                name[pr] = INVALID              // then free pr

        for each written vr in i:
            pr = allocate(vr)                   // make sure phys reg is available
            replace vr with pr in i             // change register id

        // spill any live registers before procedure calls
        if i is a CALL instruction:
            for each pr where name[pr] != INVALID:
                spill(pr)

ensure(vr):
    if name[pr] == vr for some pr:              // if the vr is in a phys reg
        return pr                               // then use it
    else
        pr = allocate(vr)                       // otherwise, allocate a phys reg
        if offset[vr] is valid:                 // if vr was spilled, load it
            emit load into pr from offset[vr]
        return pr                               // and use it

allocate(vr):
    if name[pr] == INVALID for some pr:         // if there's a free register
        name[pr] = vr                           // then allocate it
        return pr                               // and use it
    else:
        find pr that maximizes dist(name[pr])   // otherwise, find register to spill
        spill(pr)                               // spill value to stack
        name[pr] = vr                           // reallocate it
        return pr                               // and use it

dist(vr):
    return number of instructions until vr is next used (INFINITY if no use)

spill(pr):
    emit store from pr onto the stack at some offset x
    offset[name[pr]] = x
    name[pr] = INVALID

As mentioned before, this routine should work on a single basic block at a time. Normally, this local approach would not yield a globally-valid allocation without taking into account liveness analysis. However, because we have taken pains to emit code in SSA form and to load/store variables every time they are used, no virtual registers will be live at the entrance and exit of each basic block. Thus, this local register allocation scheme works globally for our purposes.

As an additional benefit, we can now add a mechanism that finally will make recursive functions work properly. Previously, there was no way to avoid having recursive calls clobber the registers from concurrent invocations. Now, we can spill all live registers whenever we CALL a function. This is conservative and is not necessary in all cases but again it works nicely for our purposes, allowing us to use recursive functions without directly modifying the ILOC interpreter. The pseudocode above includes this new functionality.

Note that we only have a single register class that we need to worry about during this allocation (e.g., general-purpose integer registers), so there is no need to create a "Class" struct; just keep the data structures locally in p5-regalloc.c. I recommend just using arrays for most of the structures because they do not need to dynamically resize and the number of physical registers will be small. For register spilling, you will need to track the offsets of each spilled virtual register; I recommend using a map from virtual register ids to stack offsets.

The revised compiler driver reduces the number of registers down to four (4). This number is perhaps a bit overly restrictive (most architectures have at least eight physical general-purpose registers), but it makes testing and debugging easier and it enables easier conversion to x86 and Y86 because we can assume that some of the registers are always available for hard-coded translation idioms.

Notes:

  • The allocate_registers function will run once for each basic block, so you must re-initialize any local data structures at the beginning of that function.
  • There is a MAX_VIRTUAL_REGS constant defined in iloc.h that you can use as an upper bound on the number of virtual registers.
  • You'll want to use the ILOCInsn_get_read_registers and ILOCInsn_get_write_register helper functions documented in iloc.h.
  • The test harness includes a testing routine that returns -9999 if there are too many registers used, so if you are seeing failures involving that constant it's probably because you're not reducing the number of registers correctly.

Complete Compiler

After you finish this project, you will have code that is (nearly) ready to run on a "real" machine. All that remains is to convert it to true machine code. Toward this end, I have included a Y86 generator (src/y86.c), which includes a fairly staightforward routine that converts each ILOC instruction into corresponding Y86 instructions. I encourage you to look through it to see how this works.

To enable the Y86 output, simply uncomment the corresponding code in main.c after you complete P5. You will then need to assemble the resulting file using yas, so you need to do this on stu. Recall that you can execute this file with your CS 261 Y86 interpreter (or the reference interpreter y86ref) using the "-e" switch.

This "closes the loop" between CS 261 (your entry to the systems courses) and CS 432 (your advanced systems elective). Hopefully you have gained an appreciation for and a greater understanding of the complexity (and yet beauty!) of many aspects of traditional computer software systems. Along the way, you have built components for two large pieces of systems software (an interpreter and a compiler). Congratulations on completing this non-trivial accomplishment!

I hope that this opens the door to a lifetime of learning and improving your knowledge of the various computer systems that you interact with and use on a daily basis. It is important that we always have a wealth of individuals who understand not only how to use a system, but how the system works "under the hood." Good luck to you as you finish your time at JMU and begin your career!

Sample Input

def int add(int x, int y)
{
    return x + y;
}

def int main()
{
    int a;
    a = 3;
    return add(a, 2);
}

Sample Output

add:
  push bp                               // prologue
  i2i sp => bp
  addI sp, 0 => sp                      // allocate space for local variables (0 bytes)
  loadAI [bp+8] => r0 {x}
  loadAI [bp+12] => r1 {y}
  add r0, r1 => r0
  i2i r0 => ret
  jump l1                               // return (x+y)
l1:
  i2i bp => sp                          // epilogue
  pop bp
  return

main:
  push bp                               // prologue
  i2i sp => bp
  addI sp, -4 => sp                     // allocate space for local variables (4 bytes)
  loadI 3 => r0
  storeAI r0 => [bp-4] {a}              // a = 3
  loadAI [bp-4] => r0 {a}
  loadI 2 => r1
  push r1
  push r0
  call add
  addI sp, 8 => sp                      // de-allocate space for parameters (8 bytes)
  i2i ret => r0
  i2i r0 => ret
  jump l2                               // return add(a, 2)
l2:
  i2i bp => sp                          // epilogue
  pop bp
  return

Submission

Please submit your p5-regalloc.c file on stu AND to the appropriate assignment on Canvas by the due date indicated there.

To submit on stu, run the following command from your project folder:

/cs/students/cs432/f20/submit.sh p5

Code Reviews

Because this is the last project of the semester, there will be no code review.

Grading

Here is the grading rubric for this project:

Grade Description Requirements
A Exceptional
  • Handle all edge cases correctly
  • No compiler warnings or memory leaks
  • All of the below
B Good
  • Spill registers when necessary
  • Spill registers around function calls
  • All of the below
C Satisfactory
  • Perform basic register allocation
  • All of the below
D Deficient
  • Some evidence of a good-faith attempt

The above rubric shows the base grade possible if your submission meets the criteria listed. Most of the items listed will be assessed via automated testing, consisting of test cases that are mostly NOT provided to you in advance (so write your own tests!).

I will also be examining your submission manually for acceptable style and documentation, as well as the use of any unsafe functions. If your submission is deficient in any of these categories you may earn a numerical deduction, and if it is particularly egregious it will earn a half- or full-letter grade deduction depending on the severity of the issue. If you are unsure regarding my standards in this regard, I would suggest reviewing the style guide and list of unsafe functions from CS 261.