Decaf

P4: Code Generation

Objective

The goal of our semester-long project is to gain experience in compiler implementation by constructing a simple compiler.

The goal of this part of the project is to gain experience implementing code generation by converting Decaf ASTs into a linear code (ILOC).

Introduction

The semester-long project for this course is to build a compiler for the Decaf language. In this project, you will implement code generation for Decaf, which will be the fourth phase in our compiler. You will do this by traversing the annotated AST from the third phase, generating equivalent ILOC code.

The starter files are available on stu at the following path:

/cs/students/cs432/f23/p4-codegen.tar.gz

Before you begin writing code, you should spend some time reading through all of the infrastructure files provided in the starter files. Our compiler this semester is a large project. Your solution will need to interface with several other classes, and you will need to thoroughly read through those classes to understand their interactions. Also, there are utility functions that you may find useful.

If you have questions about the existing code, please ask them on the #projects channel in Discord (see the link in Canvas).

As you study the code, you may find it useful to generate HTML or PDF references for the existing project files. To do this, run make docs and then see the resulting files in the "html" and "latex" subfolders. I have generated a copy of the HTML documentation for your convenience: P4 reference

If you have questions about the existing code, please ask them on the Piazza forum (see the link on the sidebar in Canvas).

For this project, you will be implementing a NodeVisitor as discussed in class. This visitor should handle the conversion from an annotated AST (as an ASTNode) to sequential ILOC code (as an InsnList).

To do this conversion, you will write visitor methods that generate code for each AST node. I recommend doing this using the "code" and "reg" attributes to store the generated code and any temporary result register, respectively, as we discussed in class. I have provided many helper functions in iloc.c and p4-codegen.c to aid in writing these methods.

To help you test your code generation, I wrote a simple ILOC simulator, which you can find in the iloc.c file. Its implementation may be of interest to you as you write your code generator. You can also use the simulator to test your ILOC output, and I have added a final pass to main.c that invokes the simulator on your generated ILOC code (with trace output on by default, which you can disable if you wish by changing the appropriate parameter in the call to run_simulator).

Assignment

WARNING: You should only proceed with actual development once you are SURE you understand exactly what your task is and how your code should interact with the rest of the system. Make sure you read the "Operational Semantics" and "Application Binary Interface" sections of the language reference carefully in addition to this description!

Implement code generation by providing an implementation for the following function:

InsnList* generate_code (ASTNode* tree);

For this particular project, you should pay special attention to the following files:

  • src/main.c - The overall driver for the compiler. Right now it implements the lexer, parser, static analysis, and code generation phases of compilation. You may change this file if it is useful for testing, but your submission should work with the original version.
  • src/p4-codegen.c - The code generation implementation. Your implementation of generate_code will go here, and you may add helper functions here if you find them useful (but note that they will be inaccessible to other modules--you should NOT modify p4-codegen.h).
  • include/iloc.h - Describes the interface to ILOCInsn and its associated functions/methods, which you should use to write your solution. It also contains the visitor that allocates memory for all symbols, which runs after static analysis but before code generation (setting the "localSize" attribute, which you should use to emit the stack adjustment in function prologues). It contains some new helper functions for working with the "code" and "reg" attributes of ASTNode structures. Finally, it contains the ILOC simulator. YOU SHOULD NOT MODIFY THIS FILE. The implementations are in iloc.c, which you may want to skim as well (although you shouldn't change those either).
  • tests - Testing framework folder. These files use the same general layout as described in the CS 261 project guide, so you may wish to review that document. UNLIKE IN CS 261, YOU WILL NOT BE PROVIDED ALL TEST CASES, only a few sanity test checks to help you estimate your progress in completing the project. You should significantly extend the provided test suite and convince yourself that your solution completely implements this phase of the compiler as described by the language reference and the description provided here.

This project is by design more open-ended than any of the previous projects in this course. There are infinitely many ILOC programs that are equivalent to any given Decaf program. Some are more efficient, some are easier to analyze, some are easier to read. Thus, your generated output will likely not exactly match the sample output below, and that is fine. I will primarily be testing your emitted code for correctness. You must also abide by the standard calling conventions as described in the language reference and as discussed in class.

To this end, the testing for this project is almost purely based on whether your emitted code produces the correct result (e.g., the return value or printed output). Thus, the output of the project as distributed is unlikely to be useful for debugging. I have included a debug_mode flag at the top of main.c that you should enable while developing. It will print all of the emitted ILOC code as well as execution trace dumps showing the status of the simulated ILOC machine while your program executes. IMPORTANT: You must disable this mode before running integration tests (which assume it is off) or submitting your solution.

Notes:

  • You can ignore the PHI instruction--it's for data flow analysis in static single-assignment form, which we will not be doing in this project.
  • Local variables will not work properly unless you at least initialize the base pointer in the prologue. The simulator initializes the stack pointer, so for C-level tests at least it is sufficient to emit only the "i2i SP => BP" part of the prologue.
  • The AllocateSymbols visitor (which runs between P3 and P4) allocates memory for all global and local variables, storing the BP offsets in the variable symbols and setting the "localSize" attribute (in bytes) for FuncDecl AST nodes, which you should use to emit the SP adjustment that allocates memory in a stack frame for local variables in function prologues.
  • Recursive functions will not work properly at this point in the project because we're re-using virtual registers across function invocations. We'll fix this in P5 by spilling registers at call sites so it's ok for now.
  • Don't forget to implement special cases for the output functions (e.g., print_int) using the PRINT ILOC instruction as noted in the Decaf language reference.
  • For this project, use the "--fdump-iloc" or "--fdump-iloc-tree" flags to the reference compiler to print the output after ILOC code generation. The latter generates an image of the AST with the "code" and "reg" attributes.

Sample Input

def int add(int x, int y)
{
    return x + y;
}

def int main()
{
    int a;
    a = 3;
    return add(a, 2);
}

Sample Output (with debug mode enabled)

add:
  push BP
  i2i SP => BP
  addI SP, 0 => SP
  loadAI [BP+16] => r0
  loadAI [BP+24] => r1
  add r0, r1 => r2
  i2i r2 => RET
  jump l0
l0:
  i2i BP => SP
  pop BP
  return
main:
  push BP
  i2i SP => BP
  addI SP, -8 => SP
  loadI 3 => r3
  storeAI r3 => [BP-8]
  loadAI [BP-8] => r4
  loadI 2 => r5
  push r5
  push r4
  call add
  addI SP, 16 => SP
  i2i RET => r6
  i2i r6 => RET
  jump l1
l1:
  i2i BP => SP
  pop BP
  return

RETURN VALUE = 5

Submission

Please submit your p4-codegen.c file on stu AND to the appropriate assignment on Canvas by the due date indicated there.

To submit on stu, run the following command from your project folder:

/cs/students/cs432/f23/submit.sh p4

Code Reviews

One of the goals of this course is to encourage good software development practices, especially when building a large software system (such as a compiler). For this project, you will be assigned two other random students in the course. You must review their code and offer constructive feedback according to the given rubric.

Submit your review on Canvas by midnight a week following the original project due date.

Grading

Here is the grading breakdown for this project:

Instructor grade90
Code review10
TOTAL100

Here is the rubric for the "Instructor grade" part of the overall grade:

Grade Description Requirements
A Exceptional
  • Generate code for locations and assignments w/ arrays
  • Generate code for function calls w/ parameters
  • Handle all edge cases correctly (e.g., modulus operator and print functions)
  • No compiler warnings or memory leaks
  • All of the below
B Good
  • Generate code for boolean literals and expressions
  • Generate code for conditionals, while loops, breaks, and continues
  • Generate code for function calls w/o parameters
  • All of the below
C Satisfactory
  • Generate code for all binary and unary integer operators except modulus
  • Generate code for locations and assignments w/o arrays
  • All of the below
D Deficient
  • Generate code for integer literals
  • Generate code for return statements
  • Generate code for blocks
  • Generate code for the addition operator
F Unacceptable
  • Some evidence of a good-faith attempt

The above rubric shows the base grade possible if your submission meets the criteria listed. Most of the items listed will be assessed via automated testing, consisting of test cases that are mostly NOT provided to you in advance (so write your own tests!).

I will also be examining your submission manually for acceptable style and documentation, as well as the use of any unsafe functions. If your submission is deficient in any of these categories you may earn a numerical deduction, and if it is particularly egregious it will earn a half- or full-letter grade deduction depending on the severity of the issue. If you are unsure regarding my standards in this regard, I would suggest reviewing the style guide and list of unsafe functions from CS 261.