PA4: Decaf Analysis
Objective
The goal of our semester-long project is to gain experience in compiler implementation by constructing a simple compiler.
The goal of this part of the project is to gain experience implementing static analysis by constructing symbol tables and performing type checking in the third pass of our Decaf compiler.
Introduction
The semester-long project for this course is to build a compiler for the Decaf language. In this project, you will implement static analysis for Decaf, which will be the third phase in our compiler. You will do this by constructing two AST visitors, one to build symbol tables and one to perform type checking.
You should download the project starter files from the "Files" tab in Canvas. The starter files include many new .java files as well as .class files representing compiled solutions.
Before you begin writing code, you should spend some time reading through all of the infrastructure files provided in the starter files. Our compiler this semester is a large project. Your solution will need to interface with several other classes, and you will need to thoroughly read through those classes to understand their interactions. Also, there are utility functions that you may find useful.
If you have questions about the existing code, please ask them on the Piazza forum (see the link on the sidebar in Canvas).
For this project, you will be implementing subclasses of
StaticAnalysis
, a specialized AST visitor class that is designed to
collect error messages without halting execution. This is because errors found
during static analysis are not usually fatal to further analysis (as are errors
in lexing or parsing), and thus it makes sense to gather as many errors as
possible during static analysis and report them all to the user at once.
Making all of our static analysis passes children of a common
StaticAnalysis
superclass makes it easy for us to aggregate errors
in one place and report them together. Thus, rather than directly throwing
InvalidProgramException
, you should call addError
with
the exception to add it to the collection.
As discussed in class, a symbol table is a registry of bound symbols, representing fields, methods, or variables. Symbol tables could store many kinds of information, but for this project we are primarily concerned with type information.
IMPORTANT: Every ASTProgram
,
ASTFunction
, and ASTBlock
should have a table
containing symbols declared at that level and their associated type. The table
should be stored as an annotation with the key "symbolTable"
.
For ASTPrograms
, the symbol table should hold types for all
global variables and functions declared in that program. For
ASTFunctions
, the table should hold types for all parameters to
that function. For ASTBlocks
, the table should hold types for all
variables declared in that block.
Note that each table represents a static scope and should store a pointer to
the table of its parent scope. For instance, a symbol table corresponding to a
method declaration should contain a reference to the table of the parent
program. This is done by passing the parent table to the
SymbolTable
constructor.
IMPORTANT: In order for the second part of this project to work properly, you will need to manually add the following "stub" function symbols to the global symbol table. These functions should not be defined anywhere in a Decaf program--they are provided by the Decaf runtime.
print_str : STR -> VOID
print_int : INT -> VOID
print_bool : BOOL -> VOID
Once your symbol tables are being generated properly, you should use them to perform type checking, which is the process of verifying that all actual types match their expected types. You should also perform any other kind of static analysis necessary to verify that the program is a valid Decaf program according to the language reference manual. Here is a list of things to check (note that this list may not be exhaustive--check the language reference document carefully for other cases):
- The program must contain a
main
function that takes no parameters and returns anint
- The program must not contain duplicate field or function names
- The program must not contain duplicate variables within a single scope
- Array sizes must be greater than zero
- Only global variables may be arrays
- Arrays may not be used in expressions without an index
- Data types of left-hand side and right-hand side of assignments must match
- Data types of equality operation operands must match
- Only ints may be used in arithmetic or relational expressions
- Only bools may be used in conditional expressions
- Only bools may be used for if and while conditionals
- Only arrays may be used in location dereferences
- Only ints may be used as array indices
- All function and variable references are valid in their scope
- Actual function arguments must match the function's formal parameters
return
statements must match their function's return type, whethervoid
or typedbreak
andcontinue
statements must appear only inside a loop
HINT: After you have correctly built symbol tables, you can access
this information from MyDecafAnalysis
by using the provided
lookupSymbol
method in DecafAnalysis
(the parent of
MyDecafAnalysis
). That function takes an ASTNode
reference and begins its search there.
HINT: I highly recommend writing a set of helper
methods called getType
that take the various child classes of
ASTExpression
, returing the inferred type of that expression. This
will help immensely during type checking, allowing you to cleanly check an
expressions actual type vs. its expected type.
HINT: Refer to the type rules in the language reference for other
guidance regarding type checking. Each type rule gives a set of antecedent
statements (above the horizontal line) that must be true in order to infer the
conclustion statement (below the line). I recommend using the expression type
rules to help write the getType
method mentioned above, and using
the statement type rules to do type checking in the appropriate visitor methods.
HINT: Remember, you may assume that all inputs to this analysis phase have been cleared by both the lexer and the parser; thus, they will not have syntax errors. For example, you don't need to check to make sure an if-statement has a conditional because the parser guarantees that it does.
Assignment
WARNING: You should only proceed with actual development once you are SURE you understand exactly what your task is and how your code should interact with the rest of the system.
For this particular project, you should pay special attention to the following classes:
ASTVisitor
andDefaultASTVisitor
- Interface and base class for AST visitors.StaticAnalysis
- Base class for static analysis passes (inherits fromDefaultASTVisitor
).Symbol
- Single Decaf symbol: a variable, array, function, or parameter.SymbolTable
- Contains a mapping from names toSymbol
objects.InvalidProgramException
- Special exception that your code should throw when you detect invalid types or some other semantic error.
Implement the BuildSymbolTables
and MyDecafAnalysis
classes. The former should build symbol tables, and the latter should perform
type checking and other static analysis as described above.
You do not need to write any code to output symbol tables. I have
provided that in the PrintDebugSymbolTables
visitor, which is now
called by DecafCompiler
. As long as your code builds the symbol
table correctly, it should be included in the standard output. See below for an
example.
For type checking, you must report all errors using the
addError
mechanism within the StaticAnalysis
class
hierarchy. You should be reporting at least one error for all invalid programs,
but no errors for all valid programs.
Sample Input
def int add(int x, int y) { return x + y; } def int main() { int a; a = 3; return add(a, 2); }
Sample Output
Program SYM TABLE: + print_str : void (str) + print_int : void (int) + print_bool : void (bool) + add : int (int,int) + main : int () Function SYM TABLE: + x : int + y : int Block SYM TABLE: Function SYM TABLE: Block SYM TABLE: + a : int