P4: Decaf Analysis
Objective
The goal of our semester-long project is to gain experience in compiler implementation by constructing a simple compiler.
The goal of this part of the project is to gain experience implementing static analysis by constructing symbol tables and performing type checking in the third pass of our Decaf compiler.
Introduction
The semester-long project for this course is to build a compiler for the Decaf language. In this project, you will implement static analysis for Decaf, which will be the third phase in our compiler. You will do this by constructing two AST visitors, one to build symbol tables and one to perform type checking.
You should download the project starter files from the "Files" tab in Canvas. The starter files include many new .java files as well as .class files representing compiled solutions.
Before you begin writing code, you should spend some time reading through all of the infrastructure files provided in the starter files. Our compiler this semester is a large project. Your solution will need to interface with several other classes, and you will need to thoroughly read through those classes to understand their interactions. Also, there are utility functions that you may find useful.
If you have questions about the existing code, please ask them on the Piazza forum (see the link on the sidebar in Canvas).
For this project, you will be implementing subclasses of
StaticAnalysis
, a specialized AST visitor class that is designed to
collect error messages without halting execution. This is because errors found
during static analysis are not usually fatal to further analysis (as are errors
in lexing or parsing), and thus it makes sense to gather as many errors as
possible during static analysis and report them all to the user at once.
Making all of our static analysis passes children of a common
StaticAnalysis
superclass makes it easy for us to aggregate errors
in one place and report them together. Thus, rather than directly throwing
InvalidProgramException
, you should call addError
with
the error message to add it to the collection.
As discussed in class, a symbol table is a registry of bound symbols
representing variables or functions. Symbol tables could store many kinds
of information, but for this project we are primarily concerned with type
information. Every ASTProgram
, ASTFunction
, and
ASTBlock
should have a table containing symbols declared at that
level and their associated type.
For ASTPrograms
, the symbol table should hold types for all
global variables and functions declared in that program. For
ASTFunctions
, the table should hold types for all parameters to
that function. For ASTBlocks
, the table should hold types for all
variables declared in that block. For every function, there will be at least two
tables (at the function level and at the block level), and note that parameters
and local variables are in separate tables.
Each table represents a static scope and should store a pointer to
the table of its parent scope. For instance, a symbol table corresponding to a
method declaration should contain a reference to the table of the parent
program. This is done by passing the parent table to the
SymbolTable
constructor.
IMPORTANT: In order for the second part of this project to work properly, you will need to manually add the following "stub" function symbols to the global symbol table. These functions should not be defined anywhere in a Decaf program--they are provided by the Decaf runtime.
print_str : STR -> VOID
print_int : INT -> VOID
print_bool : BOOL -> VOID
Once your symbol tables are being generated properly, you should use them to perform type checking, which is the process of verifying that all actual types match their expected types. You should also perform any other kind of static analysis necessary to verify that the program is a valid Decaf program according to the language reference manual, paying special attention to the type rules to determine type safety. Here is a non-exhaustive list of other things to check:
- The program must contain a
main
function that takes no parameters and returns anint
- All variable declarations must be valid
- Array sizes must be greater than zero
- Arrays may not be accessed without an index
- Function calls must match the function's definition
return
statements must match their function's return type, whethervoid
or typedbreak
andcontinue
statements must appear only inside a loop
Note that one thing you do not need to check for is that all possible paths through a function terminate in a return statement. Doing this statically requires control-flow analysis, which is difficult on an AST IR. In addition, you do not need to check that all array accesses are within bounds. Doing this statically requires a very sophisticated data-flow analysis or symbolic execution, and in fact is not always possible without input constraints.
Assignment
WARNING: You should only proceed with actual development once you are SURE you understand exactly what your task is and how your code should interact with the rest of the system.
For this particular project, you should pay special attention to the following classes:
StaticAnalysis
- Base class for static analysis passes.Symbol
- Single Decaf symbol: a variable, array, function, or parameter.SymbolTable
- Contains a mapping from names toSymbol
objects.InvalidProgramException
- Special exception that your code should throw when you detect invalid types or some other semantic error.
Implement the BuildSymbolTables
and MyDecafAnalysis
classes. The former should build symbol tables, and the latter should perform
type checking and other static analysis as described above.
You do not need to write any code to output symbol tables. I have
provided that in the PrintDebugSymbolTables
visitor, which is now
called by DecafCompiler
. As long as your code builds the symbol
table correctly, it should be included in the standard output. See below for an
example.
For type checking, you must report all errors using the
addError
mechanism within the StaticAnalysis
class
hierarchy. You should be reporting at least one error for all invalid programs,
but no errors for all valid programs. All error messages should be descriptive
and comprehensive, and should contain any relevant debug information (e.g., line
numbers or variable names).
Sample Input
def int add(int x, int y) { return x + y; } def int main() { int a; a = 3; return add(a, 2); }
Sample Output
Program SYM TABLE: + print_str : void (str) + print_int : void (int) + print_bool : void (bool) + add : int (int,int) + main : int () Function SYM TABLE: + x : int + y : int Block SYM TABLE: Function SYM TABLE: Block SYM TABLE: + a : int
Submission
Please submit your MyDecafAnalysis.java
file
to the appropriate assignment on Canvas by the due date indicated there.
Code Reviews
One of the goals of this course is to encourage good software development practices, especially when building a large software system (such as a compiler). For this project, you will be assigned two other random students in the course. You must review their code and offer constructive feedback according to the given rubric.
Submit your review on Canvas by midnight a week following the original project due date.
Grading
Here is the grading breakdown for this project:
Automated tests | 75 |
Instructor review | 15 |
Peer review | 10 |
TOTAL | 100 |