Decaf

P2: Decaf Parser

Objective

The goal of our semester-long project is to gain experience in compiler implementation by constructing a simple compiler.

The goal of this part of the project is to gain experience implementing syntax analysis by implementing a recursive-descent parser as the second pass of our Decaf compiler.

Introduction

The semester-long project for this course is to build a compiler for the Decaf language. In this project, you will implement a parser for Decaf, which will be the second phase in our compiler. You will do this by transforming the given Decaf grammar into an LL(1) grammar and implementing a recursive descent parser.

Thoroughly read the Decaf grammar in the reference document. For this assignment, you will need to construct a parser that will accept a stream of tokens and return an abstract syntax tree (AST). There are various AST objects, each of which represents some construct in the Decaf grammar.

The starter files are available on stu at the following path:

/cs/students/cs432/f23/p2-parser.tar.gz

Before you begin writing code, you should spend some time reading through all of the infrastructure files provided in the starter files. Our compiler this semester is a large project. Your solution will need to interface with several other classes, and you will need to thoroughly read through those classes to understand their interactions. Also, there are utility functions that you may find useful. If you have not yet read the C for CS 432 notes yet, you should do so now.

As you study the code, you may find it useful to generate HTML or PDF references for the existing project files. To do this, run make docs and then see the resulting files in the "html" and "latex" subfolders. I have generated a copy of the HTML documentation for your convenience: P2 reference

If you have questions about the existing code, please ask them on the #projects channel in Discord (see the link in Canvas).

Assignment

WARNING: You should only proceed with actual development once you are SURE you understand exactly what your task is and how your code should interact with the rest of the system. Make sure you read the "Syntax" section of the language reference carefully in addition to this description!

Implement syntax analysis by providing an implementation for the following function:

ASTNode* parse (TokenQueue* input);

This function takes as input a TokenQueue of Token structures from the lexing phase. The function should return an abstract syntax tree (AST), rooted in a single ASTNode object. This function represents the root non-terminal (i.e., start symbol) parsing routine in your recursive descent parser.

For this particular project, you should pay special attention to the following files:

  • src/main.c - The overall driver for the compiler. Right now it only implements the lexer and parser phases of compilation. You may change this file if it is useful for testing, but your submission should work with the original version.
  • src/p2-parser.c - The parser implementation. Your implementation of parse will go here, and you may add helper functions here if you find them useful (but note that they will be inaccessible to other modules--you should NOT modify p2-parser.h).
  • include/ast.h - Describes the interface to ASTNode structure and its associated functions/methods, which you should use to write your solution. YOU SHOULD NOT MODIFY THIS FILE. The implementations are in ast.c, which you may want to skim as well (although you shouldn't change those either).
  • include/visitor.h - Describes the interface to NodeVisitor structure and its associated functions/methods, which you won't really need to use in this project (it's only used in main() for debug output).
  • include/common.h - Contains some constants, functions, and macros that are useful in multiple modules in the compiler. In particular, you should use Error_throw_printf for "throwing" front-end errors, and you may wish to look into how the DECL_LIST_TYPE and DEF_LIST_IMPL macros work together to ease the creation of append-only list structures (e.g., one singly-linked list implementation that can be re-used in a way similar to C++ templates). YOU SHOULD NOT MODIFY THIS FILE. The implementations are in common.c, which you may want to skim as well (although you shouldn't change those either).
  • tests - Testing framework folder. These files use the same general layout as described in the CS 261 project guide, so you may wish to review that document. UNLIKE IN CS 261, YOU WILL NOT BE PROVIDED ALL TEST CASES, only a few sanity test checks to help you estimate your progress in completing the project. You should significantly extend the provided test suite and convince yourself that your solution completely implements this phase of the compiler as described by the language reference and the description provided here.

You will first want to make sure you understand the grammar in the Decaf reference document. You may want to spend a little bit of time identifying locations where is is not LL(1) so that you can anticipate obstacles to writing a recursive-descent parser. Then, you should begin writing your parser, using the grammar as a guide as discussed in class and the textbook.

WARNING: This is a large project! The code you will write is not terribly complicated, but there is a lot of it because you will be writing a method for almost every non-terminal in your grammar. This is a daunting task if you have never written a parser before, so you must start early and ask questions. I suggest starting with a much-reduced form of the grammar, then gradually expanding it. Here is an example grammar that represents a small portion of the Decaf grammar, suitable for getting started on this project:

Program -> VarDecl*
VarDecl -> Type IDENTIFIER ';'
Type -> 'int' | 'bool'

Of course, variable declarations are more complex in real Decaf because they could be arrays. However, the above grammar does describe a complete language, and that language is a subset of the real Decaf language. Thus, it is an easier target to aim for when you first start working on the project.

NOTE: For literals, you should also parse the token text into the appropriate format. For integers, this means converting it to an actual integer type, and for string literals this means removing the leading and trailing quotes and expanding escape codes.

Your solution should accept all valid Decaf programs, building the appropriate AST and annotating every node with source information (each node should have a line number that corresponds to the line number of the first token that is a part of that node or any children/descendant nodes). It should also reject invalid Decaf programs that do not conform to the syntactic requirements of the language as described in section 3 of the reference. Reject invalid programs using Error_throw_printf, printing a descriptive error message about the earliest error encountered along with source information.

Finally, there may be cases that you encounter where the language specification and the above requirements do not fully describe the expected behavior. For such cases, feel free to ask on Piazza, but such situations should be rare and inconsequential in the big picture. In general, for such cases you should just make sure the output from your solution matches the output of the reference compiler. Run it with no arguments or "-h" to get a list of valid output flags. For this project, use the "--fdump-tree" flag to print the output after the lexing phase (note that it will then finish the rest of compilation, but you can ignore any output from the later phases).

HINT: Use the provided rubric (see "Grading" below) as a guide to completing this project. First, complete all of the requirements to earn a "D", then the requirements to earn a "C", and so forth. Work incrementally and test thoroughly to ensure you have a comprehensive solution.

HINT: Check out the simple expression parser (tarball available at /cs/students/cs432/f23/expr_parser.tar.gz) as a demonstration of the basic techniques required to handle the expression part of the Decaf language.

Sample Input

def int add(int x, int y)
{
    return x + y;
}

def int main()
{
    int a;
    a = 3;
    return add(a, 2);
}

Sample Output

Program [line 1]
  FuncDecl name="add" return_type=int parameters={x:int,y:int} [line 1]
    Block [line 2]
      Return [line 3]
        Binaryop op="+" [line 3]
          Location name="x" [line 3]
          Location name="y" [line 3]
  FuncDecl name="main" return_type=int parameters={} [line 6]
    Block [line 7]
      VarDecl name="a" type=int is_array=no array_length=1 [line 8]
      Assignment [line 9]
        Location name="a" [line 9]
        Literal type=int value=3 [line 9]
      Return [line 10]
        FuncCall name="add" [line 10]
          Location name="a" [line 10]
          Literal type=int value=2 [line 10]
AST for add.decaf

Submission

Please submit your p2-parser.c file on stu AND to the appropriate assignment on Canvas by the due date indicated there.

To submit on stu, run the following command from your project folder:

/cs/students/cs432/f23/submit.sh p2

THIS PROJECT REQUIRES SIGNIFICANTLY MORE CODE THAN P1! To encourage early work on the project, I suggest aiming to finish "C" or "B" levels by the milestone date (typically a week before the normal deadline). If you submit by the milestone date, I will run the full test suite on your submission and give you some informal feedback about your progress and issues to address before the final submission.

Code Reviews

One of the goals of this course is to encourage good software development practices, especially when building a large software system (such as a compiler). For this project, you will be assigned two other random students in the course. You must review their code and offer constructive feedback according to the given rubric.

Submit your review on Canvas by the Friday following this project's due date, as described in the appropriate Canvas assignment.

Grading

Here is the grading breakdown for this project:

Instructor grade90
Code review10
TOTAL100

Here is the rubric for the "Instructor grade" part of the overall grade:

Grade Description Requirements
A Exceptional
  • Correct associativity and precedence
  • Handle all edge cases correctly
  • No compiler warnings or memory leaks
  • Report descriptive error messages w/ debug info (assessed manually)
  • All of the below
B Good
  • Parse function declarations w/ parameters
  • Parse conditionals and while loops
  • Parse binary and unary expressions
  • Parse base expressions (including subexpressions and function calls)
  • Include source code line info
  • Reject invalid programs with the above components
  • All of the below
C Satisfactory
  • Parse function declarations w/o parameters
  • Parse blocks, assignments, breaks, continues, and returns
  • Parse base expressions (literals and locations)
  • Parse array variable declarations and locations
  • Reject invalid programs with the above components
  • All of the below
D Deficient
  • Parse types and identifiers
  • Parse variable declarations w/o arrays
  • Reject invalid programs with the above components
F Unacceptable
  • Some evidence of a good-faith attempt

The above rubric shows the base grade possible if your submission meets the criteria listed. Most of the items listed will be assessed via automated testing, consisting of test cases that are mostly NOT provided to you in advance (so write your own tests!).

I will also be examining your submission manually for acceptable style and documentation, as well as the use of any unsafe functions. If your submission is deficient in any of these categories you may earn a numerical deduction, and if it is particularly egregious it will earn a half- or full-letter grade deduction depending on the severity of the issue. If you are unsure regarding my standards in this regard, I would suggest reviewing the style guide and list of unsafe functions from CS 261.