PA5 - Decaf Code Generation
Goal
The goal of our semester-long project is to gain experience in compiler implementation by constructing a simple compiler. We will be implementing a compiler for a variant of the well-known Decaf language, which is a simple Java-like procedural programming language.
The goal of this part of the project is to build a three-address code (TAC) generator for Decaf ASTs.
Description
As before, you will need to re-download the AST specification file, which now has definitions for three-address code constructs. There is also a new version of the main driver.
Here is a template starter for this part of the project:
Three-address code (TAC) is a convenient format for representing machine-independent instructions. Here are the supported operations for our version of TAC:
TACInsn dest = op src1 (unary op) dest = src1 op src2 (binary op) TACCopy dest = src (copy) dest[idx] = src (indexed copy) dest = src[idx] ( " " ) TACLabel lbl: (label) TACJump goto lbl (unconditional jump) if var goto lbl (conditional jump) TACParam param var (pass variable parameter) param lit (pass literal parameter) TACCall call lbl (call function) dest = call lbl (call function w/ return value) TACRet ret (function return) ret var (function return w/ value)
The updated version of DecafAST.py
provides definitions for a
range of TAC objects. The forms above are grouped by their corresponding TAC
object from the file. Read this code and its documentation carefully and verify
that you understand each class's purpose.
Most of the TAC classes are merely structures for holding instruction data;
however, the TACProgram
class does a bit more. It represents a
sequence of TAC instructions and labels (providing list-like access with the
len
function and indexing using bracket notation), but it also
provides several utility methods that you will find helpful while you complete
this project:
gen(insn)
Appends an instruction onto the list of instructions.newAnonLabel()
Generates a new "anonymous" label; i.e., a label without a user-specified name. This is useful for creating jump targets for Decaf code structures like conditionals and loops. Returns aTACLabel
with a name that includes a tag and a monotonically-increasing number.newTempVar()
Generates a new temporary variable; i.e., a variable without a user-defined name. This is crucial for a variety of code generation applications. Returns aTACVar
with a name that includes a tag and a monotonically-increasing number.pushLoop(continueLabel, breakLabel)
popLoop()
You can use theASTProgram
to track continue/break labels while you are in a loop. Simply call thepushLoop
function with the labels when you are ready to recursively emit the body of the loop, then callpopLoop
when you are done.currentContinueLabel()
currentBreakLabel()
If you have usedpushLoop
andpopLoop
properly, you can use these functions to retrieve the correct labels when generating code forcontinue
orbreak
statements.
Another important thing to note is the difference between
TACVar
and TACIdxVar
. The former represents a scalar
variable, while the latter represents an indexed (array) variable. For
simplicity, indexed variables may only be used in simple copy instructions
(TACCopy
), and only one may appear per instruction.
You should write a function generate_tac
that takes an AST
object (with pre-generated symbol tables). This function should generate TAC for
all of the methods in the program, adding all of the instructions in order to a
TACProgram
object and returning it.
As outlined in the template file, I recommend creating two recursive
functions gen_tac
and gen_tac_val
. The former should
handle levels of the AST that do not require returning results in a temporary
variable (e.g., ASTMethodDecl
, ASTBlock
,
ASTAssignment
, ASTConditional
,
ASTWhileLoop
, etc.), while the latter should handle levels that do
(e.g., ASTExpr
, ASTLocation
).
Sample Input
// gcd.decaf class Program { int t; // uses euclid's algorithm int gcd(int a, int b) { while (b > 0) { t = b; b = a % b; a = t; } return a; } // program entry point void main() { t = gcd(24, 196); callout("printf", "%d\n", t); } }
Sample Output
gcd: @l0: 't0 = b > 0 't1 = ! 't0 if 't1 goto @l1 t = b 't2 = a % b b = 't2 a = t @l1: ret a main: param 24 param 196 't3 = call gcd t = 't3 param "%d\n" param t call "printf" ret
Other Inputs and Outputs
You can find more examples of input and output on the class server in the following folders:
/public/cs630/decaf/inputs
/public/cs630/decaf/outputs-codegen
Submission
DUE: Friday, Apr 3, 2015, at 23:59 (11:59PM) EST
Recommended file names (for Python):
Decaf.g4 (grammar, previous project) DecafLexer.py (auto-generated using ANTLR4) DecafParser.py " DecafListener.py " DecafCompiler.py (driver, provided above) DecafAST.py (AST framework, provided above) DecafASTConverter.py (converter, previous project) DecafAnalysis.py (analysis routines, previous project) DecafCodeGen.py (TAC generation routines)
Submit a zip or tarball with your complete solution on Canvas. Your code must compile and work on the class server. Part of your grade will be based on automated testing and part will be based on style considerations. Please make sure that your project adheres to the specification exactly to avoid difficulties with the automated testing framework, and make sure that your code is clean and documented.