Decaf

PA4: Decaf Analysis

Objective

The goal of our semester-long project is to gain experience in compiler implementation by constructing a simple compiler.

The goal of this part of the project is to gain experience implementing static analysis by constructing symbol tables and performing type checking in the third pass of our Decaf compiler.

Introduction

The semester-long project for this course is to build a compiler for the Decaf language. In this project, you will implement static analysis for Decaf, which will be the third phase in our compiler. You will do this by constructing two AST visitors, one to build symbol tables and one to perform type checking.

You should download the project starter files from the "Files" tab in Canvas. The starter files include many new .java files as well as .class files representing compiled solutions.

Before you begin writing code, you should spend some time reading through all of the infrastructure files provided in the starter files. Our compiler this semester is a large project. Your solution will need to interface with several other classes, and you will need to thoroughly read through those classes to understand their interactions. Also, there are utility functions that you may find useful.

If you have questions about the existing code, please ask them on the Piazza forum (see the link on the sidebar in Canvas).

For this project, you will be implementing subclasses of StaticAnalysis, a specialized AST visitor class that is designed to collect error messages without halting execution. This is because errors found during static analysis are not usually fatal to further analysis (as are errors in lexing or parsing), and thus it makes sense to gather as many errors as possible during static analysis and report them all to the user at once.

Making all of our static analysis passes children of a common StaticAnalysis superclass makes it easy for us to aggregate errors in one place and report them together. Thus, rather than directly throwing InvalidProgramException, you should call addError with the exception to add it to the collection.

As discussed in class, a symbol table is a registry of bound symbols, representing fields, methods, or variables. Symbol tables could store many kinds of information, but for this project we are primarily concerned with type information.

IMPORTANT: Every ASTProgram, ASTFunction, and ASTBlock should have a table containing symbols declared at that level and their associated type. The table should be stored as an annotation with the key "symbolTable".

For ASTPrograms, the symbol table should hold types for all global variables and functions declared in that program. For ASTFunctions, the table should hold types for all parameters to that function. For ASTBlocks, the table should hold types for all variables declared in that block.

Note that each table represents a static scope and should store a pointer to the table of its parent scope. For instance, a symbol table corresponding to a method declaration should contain a reference to the table of the parent program. This is done by passing the parent table to the SymbolTable constructor.

IMPORTANT: In order for the second part of this project to work properly, you will need to manually add the following "stub" function symbols to the global symbol table. These functions should not be defined anywhere in a Decaf program--they are provided by the Decaf runtime.

  • print_str : STR -> VOID
  • print_int : INT -> VOID
  • print_bool : BOOL -> VOID

Once your symbol tables are being generated properly, you should use them to perform type checking, which is the process of verifying that all actual types match their expected types. You should also perform any other kind of static analysis necessary to verify that the program is a valid Decaf program according to the language reference manual. Here is a list of things to check (note that this list may not be exhaustive--check the language reference document carefully for other cases):

  • The program must contain a main function that takes no parameters and returns an int
  • The program must not contain duplicate field or function names
  • The program must not contain duplicate variables within a single scope
  • Array sizes must be greater than zero
  • Only global variables may be arrays
  • Arrays may not be used in expressions without an index
  • Data types of left-hand side and right-hand side of assignments must match
  • Data types of equality operation operands must match
  • Only ints may be used in arithmetic or relational expressions
  • Only bools may be used in conditional expressions
  • Only bools may be used for if and while conditionals
  • Only arrays may be used in location dereferences
  • Only ints may be used as array indices
  • All function and variable references are valid in their scope
  • Actual function arguments must match the function's formal parameters
  • return statements must match their function's return type, whether void or typed
  • break and continue statements must appear only inside a loop

HINT: After you have correctly built symbol tables, you can access this information from MyDecafAnalysis by using the provided lookupSymbol method in DecafAnalysis (the parent of MyDecafAnalysis). That function takes an ASTNode reference and begins its search there.

HINT: I highly recommend writing a set of helper methods called getType that take the various child classes of ASTExpression, returing the inferred type of that expression. This will help immensely during type checking, allowing you to cleanly check an expressions actual type vs. its expected type.

HINT: Refer to the type rules in the language reference for other guidance regarding type checking. Each type rule gives a set of antecedent statements (above the horizontal line) that must be true in order to infer the conclustion statement (below the line). I recommend using the expression type rules to help write the getType method mentioned above, and using the statement type rules to do type checking in the appropriate visitor methods.

HINT: Remember, you may assume that all inputs to this analysis phase have been cleared by both the lexer and the parser; thus, they will not have syntax errors. For example, you don't need to check to make sure an if-statement has a conditional because the parser guarantees that it does.

Assignment

WARNING: You should only proceed with actual development once you are SURE you understand exactly what your task is and how your code should interact with the rest of the system.

For this particular project, you should pay special attention to the following classes:

  • ASTVisitor and DefaultASTVisitor - Interface and base class for AST visitors.
  • StaticAnalysis - Base class for static analysis passes (inherits from DefaultASTVisitor).
  • Symbol - Single Decaf symbol: a variable, array, function, or parameter.
  • SymbolTable - Contains a mapping from names to Symbol objects.
  • InvalidProgramException - Special exception that your code should throw when you detect invalid types or some other semantic error.

Implement the BuildSymbolTables and MyDecafAnalysis classes. The former should build symbol tables, and the latter should perform type checking and other static analysis as described above.

You do not need to write any code to output symbol tables. I have provided that in the PrintDebugSymbolTables visitor, which is now called by DecafCompiler. As long as your code builds the symbol table correctly, it should be included in the standard output. See below for an example.

For type checking, you must report all errors using the addError mechanism within the StaticAnalysis class hierarchy. You should be reporting at least one error for all invalid programs, but no errors for all valid programs.

Sample Input

def int add(int x, int y)
{
    return x + y;
}

def int main()
{
    int a;
    a = 3;
    return add(a, 2);
}

Sample Output

Program
SYM TABLE:
  + print_str : void (str)
  + print_int : void (int)
  + print_bool : void (bool)
  + add : int (int,int)
  + main : int ()

  Function
  SYM TABLE:
    + x : int
    + y : int

    Block
    SYM TABLE:

  Function
  SYM TABLE:

    Block
    SYM TABLE:
      + a : int
AST for add.decaf