Decaf

PA3: Decaf Parser

Objective

The goal of our semester-long project is to gain experience in compiler implementation by constructing a simple compiler.

The goal of this part of the project is to gain experience implementing syntax analysis by implementing a recursive-descent parser as the second pass of our Decaf compiler.

Introduction

The semester-long project for this course is to build a compiler for the Decaf language. In this project, you will implement a parser for Decaf, which will be the second phase in our compiler. You will do this by transforming the given Decaf grammar into an LL(1) grammar and implementing a recursive descent parser.

Thoroughly read the Decaf grammar in the reference document. For this assignment, you will need to construct a parser that will accept a stream of tokens and return an abstract syntax tree (AST). There are various AST objects, each of which represents some construct in the Decaf grammar.

You should download the project starter files from the "Files" tab in Canvas. The starter files include many new .java files as well as .class files representing compiled solutions.

Before you begin writing code, you should spend some time reading through all of the infrastructure files provided in the starter files. Our compiler this semester is a large project. Your solution will need to interface with several other classes, and you will need to thoroughly read through those classes to understand their interactions. Also, there are utility functions that you may find useful.

If you have questions about the existing code, please ask them early!

Assignment

WARNING: You should only proceed with actual development once you are SURE you understand exactly what your task is and how your code should interact with the rest of the system.

First, you should convert the given Decaf grammar into a form suitable for building a recursive descent parser. In other words, you should convert the grammar into an LL(1) grammar. You should pay particular attention to properly enforcing associativity and precedence as well as removing the need for any backtracking (e.g., by removing left recursion and left factoring).

Then, you should implement a recursive descent parser for the grammar in the MyDecafParser class by providing an implementation for the following function:

    public ASTProgram parseProgram(Queue<Token> tokens)
            throws InvalidSyntaxException

This function takes as input a Queue of Token objects from the lexing phase. The function should return an AST, rooted in an ASTProgram object. This function represents the root non-terminal parsing routine in your recursive descent parser. You should create corresponding parser functions for each major non-terminal in your grammar.

For this particular project, you should pay special attention to the following classes:

  • DecafParser - Base class for your parser. Provides some useful helper methods for manipulating the queue of tokens.
  • ASTNode - Abstract syntax tree (AST) base class. All other AST node objects inherit from this one. See the Decaf reference document (pg. 4) for a list of hierarchy relationships between the various child classes of ASTNode. You will need to read and use all of these classes in your parser.
  • InvalidSyntaxException - Special exception that your code should throw (with a descriptive error message!) when you detect invalid syntax.

The provided driver will print the resulting AST as formatted text (see below), and it will attempt build a graphical representation using the Dot language from the GraphViz package. However, the conversion from a ".dot" file to a graphic (e.g., *.png) will not work if there is no dot utility installed and accessible on the command line. If it does not work, you may want to figure out an alternate way to render .dot files to .png files on your system.

If you would like to check your work against the output of the reference compiler, you can use the --fdump-tree and --fdump-tree-dot command-line options to generate text-based and graphical trees (respectively).

For this assignment, you should pass through the source info from the tokens to the AST objects. You will be graded on the quality of your error messages for programs that fail to compile, both in terms of how helpful the message was and whether it contained accurate source file info.

Sample Input

def int add(int x, int y)
{
    return x + y;
}

def int main()
{
    int a;
    a = 3;
    return add(a, 2);
}

Sample Output

Program  [add.decaf:1:1]
  Function: add : int (x:int, y:int)  [add.decaf:1:1]
    Block  [add.decaf:2:1]
      Return  [add.decaf:3:5]
        BinaryExpr: +  [add.decaf:3:14]
          Location: x  [add.decaf:3:12]
          Location: y  [add.decaf:3:16]
  Function: main : int ()  [add.decaf:6:1]
    Block  [add.decaf:7:1]
      Variable: a : int size=1  [add.decaf:8:5]
      Assignment: a = 3  [add.decaf:9:5]
        Location: a  [add.decaf:9:5]
        Literal: 3 : int  [add.decaf:9:9]
      Return  [add.decaf:10:5]
        FunctionCall: add (a, 2)  [add.decaf:10:12]
          Location: a  [add.decaf:10:16]
          Literal: 2 : int  [add.decaf:10:19]
AST for add.decaf

Submission

Please submit your .java file to the appropriate assignment on Canvas.