P3: Static Analysis
Objective
The goal of our semester-long project is to gain experience in compiler implementation by constructing a simple compiler.
The goal of this part of the project is to gain experience implementing static analysis (including both type inference and type checking as well as other verification) in the third pass of our Decaf compiler.
Introduction
The semester-long project for this course is to build a compiler for the Decaf language. In this project, you will implement static analysis for Decaf, which will be the third phase in our compiler. You will do this by constructing a visitor pass that detects and reports errors.
The starter files are available on stu at the following path:
/cs/students/cs432/sp23/p3-analysis.tar.gz
Before you begin writing code, you should spend some time reading through all of the infrastructure files provided in the starter files. Our compiler this semester is a large project. Your solution will need to interface with several other classes, and you will need to thoroughly read through those classes to understand their interactions. Also, there are utility functions that you may find useful.
As you study the code, you may find it useful to generate HTML or PDF
references for the existing project files. To do this, run make docs
and then see the resulting files in the "html" and "latex"
subfolders. I have generated a copy of the HTML documentation for your
convenience: P3 reference
If you have questions about the existing code, please ask them on the #projects channel in Discord (see the link in Canvas).
For this project, you will be implementing a NodeVisitor
as
discussed in class. This visitor should perform type inference and type
checking, collecting error messages as they are detected without halting
execution. This is because errors found during static analysis are not usually
fatal to further analysis (as are errors in lexing or parsing), and thus it
makes sense to gather as many errors as possible during static analysis and
report them all to the user at once.
The list of errors should be stored in and ErrorList
inside the
visitor state information (i.e., the AnalysisData
struct. I have
provided a ErrorList_printf
helper function and
ERROR_LIST
macro to facilitate error reporting. You can see an
example of how to use these helpers in the provided
lookup_symbol_with_reporting
function.
Symbol Tables
As discussed in class, a symbol table is a registry of bound symbols
representing variables or functions. Symbol tables could store many kinds
of information, but for this project we are primarily concerned with type
information. Every Program
, FuncDecl
, and
Block
AST node will have a table containing symbols declared at
that level and their associated type. The implementation of symbol table
construction has been provided to you as part of this project's distribution,
and the code can be seen in symbol.h
and symbol.c
.
For a Program
, the symbol table will hold types for all
global variables and functions declared in that program. For
FuncDecls
, the table will hold types for all parameters to
that function. For Blocks
, the table will hold types for all
variables declared in that block. For every function, there will be at least two
tables (at the function level and at the block level), and note that parameters
and local variables are in separate tables (the function table for the former
and the block table for the latter).
Each table represents a static scope and will store a pointer to the table of its parent scope. For instance, a symbol table corresponding to a method declaration will contain a reference to the table of the parent program.
IMPORTANT: In order for the second part of this project to work properly with the built-in output routines, there will need to be manually-added symbols for them in the global symbol table. These functions should not be defined anywhere in a Decaf program--they are provided by the Decaf runtime. Here are their types:
print_str : STR -> VOID
print_int : INT -> VOID
print_bool : BOOL -> VOID
In this project, you should access the constructed symbol tables primarily using the provided lookup_symbol wrapper function to retrieve a symbol given its name. In some instances it may also be useful to access the SymbolTable for a particular node directly, and in those cases you can access it (by pointer) using ASTNode_get_attribute with the attribute name "symbolTable".
Assignment
WARNING: You should only proceed with actual development once you are SURE you understand exactly what your task is and how your code should interact with the rest of the system. Make sure you read the "Type Checking" section of the language reference carefully in addition to this description!
Implement syntax analysis by providing an implementation for the following function:
ErrorList* analyze (ASTNode* tree);
This function takes as input an ASTNode
representing the root
of the AST from the parsing phase. The function should return a (possibly-empty)
list of errors detected during static analysis. You should use the generated symbol tables to
perform type checking, which is the process of verifying that all actual types
match their expected types. You should also perform any other kind of static
analysis necessary to verify that the program is a valid Decaf program according
to the language reference manual, paying special attention to the type rules to
determine type safety.
Note that one thing you do not need to check for is that all possible paths through a function terminate in a return statement. Doing this statically requires control-flow analysis, which is difficult on an AST IR. In addition, you do not need to check that all array accesses are within bounds. Doing this statically requires a very sophisticated data-flow analysis or symbolic execution, and in fact is not always possible without input constraints.
For this particular project, you should pay special attention to the following files:
src/main.c
- The overall driver for the compiler. Right now it implements the lexer, parser, and static analysis phases of compilation. You may change this file if it is useful for testing, but your submission should work with the original version.src/p3-analysis.c
- The static analysis implementation. Your implementation ofanalyze
will go here, and you may add helper functions here if you find them useful (but note that they will be inaccessible to other modules--you should NOT modifyp3-analysis.h
).include/symbol.h
- Describes the interface toSymbol
andSymbolTable
structure and their associated functions/methods, which you should use to write your solution. It also contains the visitor that builds these symbol tables. YOU SHOULD NOT MODIFY THIS FILE. The implementations are insymbol.c
, which you may want to skim as well (although you shouldn't change those either).tests
- Testing framework folder. These files use the same general layout as described in the CS 261 project guide, so you may wish to review that document. UNLIKE IN CS 261, YOU WILL NOT BE PROVIDED ALL TEST CASES, only a few sanity test checks to help you estimate your progress in completing the project. You should significantly extend the provided test suite and convince yourself that your solution completely implements this phase of the compiler as described by the language reference and the description provided here.
You do not need to write any code to output symbol tables. I have
provided that in the PrintSymbols
visitor, which is now
called by main.c
. See below for an example. The graphical output
now includes any inferred type information (assuming you are correctly using
the "type" attribute -- use the provided SET_INFERRED_TYPE
macro!),
and you can use that to check your type inference apart from your type checking
(there are no unit or integration tests for the inference part specifically).
For type checking, you should report all errors using the
ErrorList_printf
mechanism described above. You should be reporting
at least one error for all invalid programs, but no errors for all valid
programs. All error messages should be descriptive and comprehensive, and should
contain any relevant debug information (e.g., line numbers or variable names).
Sample Input
def int add(int x, int y) { return x + y; } def int main() { int a; a = 3; return add(a, 2); }
Sample Output
Program [line 1] SYM TABLE: print_int : (int) -> void print_bool : (bool) -> void print_str : (str) -> void add : (int, int) -> int main : () -> int FuncDecl name="add" return_type=int parameters={x:int,y:int} [line 1] SYM TABLE: x : int y : int Block [line 2] SYM TABLE: FuncDecl name="main" return_type=int parameters={} [line 6] SYM TABLE: Block [line 7] SYM TABLE: a : int
Submission
Please submit your p3-analysis.c
file on stu AND to the
appropriate assignment on Canvas by the due date indicated there.
To submit on stu, run the following command from your project folder:
/cs/students/cs432/sp23/submit.sh p3
Code Reviews
One of the goals of this course is to encourage good software development practices, especially when building a large software system (such as a compiler). For this project, you will be assigned two other random students in the course. You must review their code and offer constructive feedback according to the given rubric.
Submit your review on Canvas by midnight a week following the original project due date.
Grading
Here is the grading breakdown for this project:
Instructor grade | 90 |
Peer review | 10 |
TOTAL | 100 |
Here is the rubric for the "Instructor grade" part of the overall grade:
Grade | Description | Requirements |
---|---|---|
A | Exceptional |
|
B | Good |
|
C | Satisfactory |
|
D | Deficient |
|
F | Unacceptable |
|
The above rubric shows the base grade possible if your submission meets the criteria listed. Most of the items listed will be assessed via automated testing, consisting of test cases that are mostly NOT provided to you in advance (so write your own tests!).
I will also be examining your submission manually for acceptable style and documentation, as well as the use of any unsafe functions. If your submission is deficient in any of these categories you may earn a numerical deduction, and if it is particularly egregious it will earn a half- or full-letter grade deduction depending on the severity of the issue. If you are unsure regarding my standards in this regard, I would suggest reviewing the style guide and list of unsafe functions from CS 261.