CS 432

P3: Static Analysis

Objective

The goal of our semester-long project is to gain experience in compiler implementation by constructing a simple compiler.

The goal of this part of the project is to gain experience implementing static analysis (including both type inference and type checking as well as other verification) in the third pass of our Decaf compiler.

Introduction

The semester-long project for this course is to build a compiler for the Decaf language. In this project, you will implement static analysis for Decaf, which will be the third phase in our compiler. You will do this by constructing a visitor pass that detects and reports errors.

The starter files are available on stu at the following path:

/cs/students/cs432/f21/p3-analysis.tar.gz

Before you begin writing code, you should spend some time reading through all of the infrastructure files provided in the starter files. Our compiler this semester is a large project. Your solution will need to interface with several other classes, and you will need to thoroughly read through those classes to understand their interactions. Also, there are utility functions that you may find useful.

As you study the code, you may find it useful to generate HTML or PDF references for the existing project files. To do this, run make docs and then see the resulting files in the "html" and "latex" subfolders. I have generated a copy of the HTML documentation for your convenience: P3 reference

If you have questions about the existing code, please ask them on the Piazza forum (see the link on the sidebar in Canvas).

For this project, you will be implementing a NodeVisitor as discussed in class. This visitor should perform type inference and type checking, collecting error messages as they are detected without halting execution. This is because errors found during static analysis are not usually fatal to further analysis (as are errors in lexing or parsing), and thus it makes sense to gather as many errors as possible during static analysis and report them all to the user at once.

The list of errors should be stored in and ErrorList inside the visitor state information (i.e., the AnalysisData struct. I have provided a ErrorList_printf helper function and ERROR_LIST macro to facilitate error reporting. You can see an example of how to use these helpers in the provided lookup_symbol_with_reporting function.

Symbol Tables

As discussed in class, a symbol table is a registry of bound symbols representing variables or functions. Symbol tables could store many kinds of information, but for this project we are primarily concerned with type information. Every Program, FuncDecl, and Block AST node will have a table containing symbols declared at that level and their associated type. The implementation of symbol table construction has been provided to you as part of this project's distribution, and the code can be seen in symbol.h and symbol.c.

For a Program, the symbol table will hold types for all global variables and functions declared in that program. For FuncDecls, the table will hold types for all parameters to that function. For Blocks, the table will hold types for all variables declared in that block. For every function, there will be at least two tables (at the function level and at the block level), and note that parameters and local variables are in separate tables (the function table for the former and the block table for the latter).

Each table represents a static scope and will store a pointer to the table of its parent scope. For instance, a symbol table corresponding to a method declaration will contain a reference to the table of the parent program.

IMPORTANT: In order for the second part of this project to work properly with the built-in output routines, there will need to be manually-added symbols for them in the global symbol table. These functions should not be defined anywhere in a Decaf program--they are provided by the Decaf runtime. Here are their types:

print_str : STR -> VOID
print_int : INT -> VOID
print_bool : BOOL -> VOID

In this project, you should access the constructed symbol tables primarily using the provided lookup_symbol wrapper function to retrieve a symbol given its name. In some instances it may also be useful to access the SymbolTable for a particular node directly, and in those cases you can access it (by pointer) using ASTNode_get_attribute with the attribute name "symbolTable".

Assignment

WARNING: You should only proceed with actual development once you are SURE you understand exactly what your task is and how your code should interact with the rest of the system. Make sure you read the "Type Checking" section of the language reference carefully in addition to this description!

Implement syntax analysis by providing an implementation for the following function:

ErrorList* analyze (ASTNode* tree);

This function takes as input an ASTNode representing the root of the AST from the parsing phase. The function should return a (possibly-empty) list of errors detected during static analysis. You should use the generated symbol tables to perform type checking, which is the process of verifying that all actual types match their expected types. You should also perform any other kind of static analysis necessary to verify that the program is a valid Decaf program according to the language reference manual, paying special attention to the type rules to determine type safety.

Note that one thing you do not need to check for is that all possible paths through a function terminate in a return statement. Doing this statically requires control-flow analysis, which is difficult on an AST IR. In addition, you do not need to check that all array accesses are within bounds. Doing this statically requires a very sophisticated data-flow analysis or symbolic execution, and in fact is not always possible without input constraints.

For this particular project, you should pay special attention to the following files:

src/main.c - The overall driver for the compiler. Right now it implements the lexer, parser, and static analysis phases of compilation. You may change this file if it is useful for testing, but your submission should work with the original version.
src/p3-analysis.c - The static analysis implementation. Your implementation of analyze will go here, and you may add helper functions here if you find them useful (but note that they will be inaccessible to other modules--you should NOT modify p3-analysis.h).
include/symbol.h - Describes the interface to Symbol and SymbolTable structure and their associated functions/methods, which you should use to write your solution. It also contains the visitor that builds these symbol tables. YOU SHOULD NOT MODIFY THIS FILE. The implementations are in symbol.c, which you may want to skim as well (although you shouldn't change those either).
tests - Testing framework folder. These files use the same general layout as described in the CS 261 project guide, so you may wish to review that document. UNLIKE IN CS 261, YOU WILL NOT BE PROVIDED ALL TEST CASES, only a few sanity test checks to help you estimate your progress in completing the project. You should significantly extend the provided test suite and convince yourself that your solution completely implements this phase of the compiler as described by the language reference and the description provided here.

You do not need to write any code to output symbol tables. I have provided that in the PrintSymbols visitor, which is now called by main.c. See below for an example. The graphical output now includes any inferred type information (assuming you are correctly using the "type" attribute -- use the provided SET_INFERRED_TYPE macro!), and you can use that to check your type inference apart from your type checking (there are no unit or integration tests for the inference part specifically).

For type checking, you should report all errors using the ErrorList_printf mechanism described above. You should be reporting at least one error for all invalid programs, but no errors for all valid programs. All error messages should be descriptive and comprehensive, and should contain any relevant debug information (e.g., line numbers or variable names).

Sample Input

def int add(int x, int y)
{
    return x + y;
}

def int main()
{
    int a;
    a = 3;
    return add(a, 2);
}

Sample Output

Program [line 1]
SYM TABLE:
 print_int : (int) -> void
 print_bool : (bool) -> void
 print_str : (str) -> void
 add : (int, int) -> int
 main : () -> int

  FuncDecl name="add" return_type=int parameters={x:int,y:int} [line 1]
  SYM TABLE:
   x : int
   y : int

    Block [line 2]
    SYM TABLE:

  FuncDecl name="main" return_type=int parameters={} [line 6]
  SYM TABLE:

    Block [line 7]
    SYM TABLE:
     a : int

To generate a PNG (like the one above) from the DOT output, add the following line to main after the GenerateASTGraph traversal:

    system("dot -Tpng -o ast.png ast.dot");

This will require you to have the dot utility installed; this comes from the GraphViz package and should be available in most Linux-based OSes using your standard package manager.

Submission

Please submit your p3-analysis.c file on stu AND to the appropriate assignment on Canvas by the due date indicated there.

To submit on stu, run the following command from your project folder:

/cs/students/cs432/f21/submit.sh p3

Code Reviews

One of the goals of this course is to encourage good software development practices, especially when building a large software system (such as a compiler). For this project, you will be assigned two other random students in the course. You must review their code and offer constructive feedback according to the given rubric.

Submit your review on Canvas by midnight a week following the original project due date.

Grading

Here is the grading breakdown for this project:

Instructor grade	90
Peer review	10
TOTAL	100

Here is the rubric for the "Instructor grade" part of the overall grade:

Grade	Description	Requirements
A	Exceptional	Handle all edge cases correctly No compiler warnings or memory leaks All of the below
B	Good	Reject all kinds of type mismatch errors requiring full type inference Reject duplicate symbols within a particular symbol table Reject array accesses without an index Report descriptive error messages w/ debug info (assessed manually) All of the below
C	Satisfactory	Reject any type mismatch errors that require only literal and location type inference Reject array size that aren't greater than zero Reject `return` statements that do not match their function's return type Reject `break` and `continue` statements outside a loop All of the below
D	Deficient	Reject `VOID` variables Reject undefined variables in locations Reject programs with missing or incorrect '`main`' function Accept all otherwise valid programs
F	Unacceptable	Some evidence of a good-faith attempt

The above rubric shows the base grade possible if your submission meets the criteria listed. Most of the items listed will be assessed via automated testing, consisting of test cases that are mostly NOT provided to you in advance (so write your own tests!).

I will also be examining your submission manually for acceptable style and documentation, as well as the use of any unsafe functions. If your submission is deficient in any of these categories you may earn a numerical deduction, and if it is particularly egregious it will earn a half- or full-letter grade deduction depending on the severity of the issue. If you are unsure regarding my standards in this regard, I would suggest reviewing the style guide and list of unsafe functions from CS 261.

Compilers - Fall 2021