Decaf

P4: Decaf Analysis

Objective

The goal of our semester-long project is to gain experience in compiler implementation by constructing a simple compiler.

The goal of this part of the project is to gain experience implementing static analysis by constructing symbol tables and performing type checking in the third pass of our Decaf compiler.

Introduction

The semester-long project for this course is to build a compiler for the Decaf language. In this project, you will implement static analysis for Decaf, which will be the third phase in our compiler. You will do this by constructing two AST visitors, one to build symbol tables and one to perform type checking.

You should download the project starter files from the "Files" tab in Canvas. The starter files include many new .java files as well as .class files representing compiled solutions.

Before you begin writing code, you should spend some time reading through all of the infrastructure files provided in the starter files. Our compiler this semester is a large project. Your solution will need to interface with several other classes, and you will need to thoroughly read through those classes to understand their interactions. Also, there are utility functions that you may find useful.

If you have questions about the existing code, please ask them on the Piazza forum (see the link on the sidebar in Canvas).

For this project, you will be implementing subclasses of StaticAnalysis, a specialized AST visitor class that is designed to collect error messages without halting execution. This is because errors found during static analysis are not usually fatal to further analysis (as are errors in lexing or parsing), and thus it makes sense to gather as many errors as possible during static analysis and report them all to the user at once.

Making all of our static analysis passes children of a common StaticAnalysis superclass makes it easy for us to aggregate errors in one place and report them together. Thus, rather than directly throwing InvalidProgramException, you should call addError with the error message to add it to the collection.

As discussed in class, a symbol table is a registry of bound symbols representing variables or functions. Symbol tables could store many kinds of information, but for this project we are primarily concerned with type information. Every ASTProgram, ASTFunction, and ASTBlock should have a table containing symbols declared at that level and their associated type.

For ASTPrograms, the symbol table should hold types for all global variables and functions declared in that program. For ASTFunctions, the table should hold types for all parameters to that function. For ASTBlocks, the table should hold types for all variables declared in that block. For every function, there will be at least two tables (at the function level and at the block level), and note that parameters and local variables are in separate tables.

Each table represents a static scope and should store a pointer to the table of its parent scope. For instance, a symbol table corresponding to a method declaration should contain a reference to the table of the parent program. This is done by passing the parent table to the SymbolTable constructor.

IMPORTANT: In order for the second part of this project to work properly, you will need to manually add the following "stub" function symbols to the global symbol table. These functions should not be defined anywhere in a Decaf program--they are provided by the Decaf runtime.

  • print_str : STR -> VOID
  • print_int : INT -> VOID
  • print_bool : BOOL -> VOID

Once your symbol tables are being generated properly, you should use them to perform type checking, which is the process of verifying that all actual types match their expected types. You should also perform any other kind of static analysis necessary to verify that the program is a valid Decaf program according to the language reference manual, paying special attention to the type rules to determine type safety. Here is a non-exhaustive list of other things to check:

  • The program must contain a main function that takes no parameters and returns an int
  • All variable declarations must be valid
  • Array sizes must be greater than zero
  • Arrays may not be accessed without an index
  • Function calls must match the function's definition
  • return statements must match their function's return type, whether void or typed
  • break and continue statements must appear only inside a loop

Note that one thing you do not need to check for is that all possible paths through a function terminate in a return statement. Doing this statically requires control-flow analysis, which is difficult on an AST IR. In addition, you do not need to check that all array accesses are within bounds. Doing this statically requires a very sophisticated data-flow analysis or symbolic execution, and in fact is not always possible without input constraints.

Assignment

WARNING: You should only proceed with actual development once you are SURE you understand exactly what your task is and how your code should interact with the rest of the system.

For this particular project, you should pay special attention to the following classes:

  • StaticAnalysis - Base class for static analysis passes.
  • Symbol - Single Decaf symbol: a variable, array, function, or parameter.
  • SymbolTable - Contains a mapping from names to Symbol objects.
  • InvalidProgramException - Special exception that your code should throw when you detect invalid types or some other semantic error.

Implement the BuildSymbolTables and MyDecafAnalysis classes. The former should build symbol tables, and the latter should perform type checking and other static analysis as described above.

You do not need to write any code to output symbol tables. I have provided that in the PrintDebugSymbolTables visitor, which is now called by DecafCompiler. As long as your code builds the symbol table correctly, it should be included in the standard output. See below for an example.

For type checking, you must report all errors using the addError mechanism within the StaticAnalysis class hierarchy. You should be reporting at least one error for all invalid programs, but no errors for all valid programs. All error messages should be descriptive and comprehensive, and should contain any relevant debug information (e.g., line numbers or variable names).

Sample Input

def int add(int x, int y)
{
    return x + y;
}

def int main()
{
    int a;
    a = 3;
    return add(a, 2);
}

Sample Output

Program
SYM TABLE:
  + print_str : void (str)
  + print_int : void (int)
  + print_bool : void (bool)
  + add : int (int,int)
  + main : int ()

  Function
  SYM TABLE:
    + x : int
    + y : int

    Block
    SYM TABLE:

  Function
  SYM TABLE:

    Block
    SYM TABLE:
      + a : int
AST for add.decaf

Submission

Please submit your MyDecafAnalysis.java file to the appropriate assignment on Canvas by the due date indicated there.

Code Reviews

One of the goals of this course is to encourage good software development practices, especially when building a large software system (such as a compiler). For this project, you will be assigned two other random students in the course. You must review their code and offer constructive feedback according to the given rubric.

Submit your review on Canvas by midnight a week following the original project due date.

Grading

Here is the grading breakdown for this project:

Automated tests75
Instructor review15
Peer review10
TOTAL100