PA2: Decaf Lexer
Objective
The goal of our semester-long project is to gain experience in compiler implementation by constructing a simple compiler.
The goal of this part of the project is to gain experience specifying lexical analysis patterns by using regular expressions to implement the first pass of our Decaf compiler.
Introduction
The semester-long project for this course is to build a compiler for the Decaf language. In this project, you will implement a lexer for Decaf, which will be the first phase in our compiler. You will do this by creating regular expressions to describe various pieces of the Decaf language, and writing some Java code to repeatedly apply these regular expressions to an input character stream.
Thoroughly read the "Lexical Considerations" section of the reference document and look over the grammar in the next section. For this assignment, you will need to construct a lexer that will scan a Decaf program and produce a stream of tokens.
You should download the project starter files from the "Files" tab in Canvas. You may also be interested in the Eclipse setup tutorial.
Before you begin writing code, you should spend some time reading through all of the infrastructure files provided in the starter files. Our compiler this semester is a large project. Your solution will need to interface with several other classes, and you will need to thoroughly read through those classes to understand their interactions. Also, there are utility functions that you may find useful.
As you study the code, you may find it useful to generate Javadoc
references for the existing project files. To do this, run mvn site
and then open (using a web browser) the index.html
file that is
created in the target/site/apidocs
folder.
If you have questions about the existing code, please ask them on the Piazza forum (see the link on the sidebar in Canvas).
Assignment
WARNING: You should only proceed with actual development once you are SURE you understand exactly what your task is and how your code should interact with the rest of the system.
Implement the MyDecafLexer
class by providing an implementation
for the following function:
public Queue<Token> lex(BufferedReader input, String filename) throws IOException, InvalidTokenException
This function takes as input a BufferedReader
object that
should be used to read in the Decaf source, and a filename
that should be used to create source line information. The function should
return a queue of scanned Token
objects.
The Token
objects should contain source code information,
encoded in SourceInfo
objects. The source info should contain
filename and line number information for all tokens.
For this particular project, you should pay special attention to the following classes:
DecafCompiler
- The overall driver for the compiler. Right now it only implements the lexer phase of compilation; it will gradually expand over the semester as we implement more phases of the compiler.DecafLexer
- The base lexer class. You will implement a subclass of this class, and you may find some of the methods provided in the parent class useful in your solution.Token
- Stores a single Decaf token. Your code should generate a stream of these objects, stored in aQueue
.SourceInfo
- Stores source file information (i.e., filename, line number, and column number). Your code should make sure this information is present in anyToken
object you return.InvalidTokenException
- Special exception that your code should throw when you detect an invalid token.
The recommended solution for this project (as discussed in class) is to
process the input one line at a time. Write one regex per token type in general
(there may be one or two exceptions). Prioritize your regular expressions and
try each of them in order. When you find a match, extract the matching text from
the line and continue until either no regex matches the next input (i.e., an
InvalidTokenException
) or the entire input is consumed. Check out
DecafLexer
for a variety of useful functions that will do some of
the above work for you!
Sample Input
def int main() { int a; a = 4 + 5; return a; }
Sample Output
KEY "def" [pa02-sample.decaf:1] KEY "int" [pa02-sample.decaf:1] ID "main" [pa02-sample.decaf:1] SYM "(" [pa02-sample.decaf:1] SYM ")" [pa02-sample.decaf:1] SYM "{" [pa02-sample.decaf:2] KEY "int" [pa02-sample.decaf:3] ID "a" [pa02-sample.decaf:3] SYM ";" [pa02-sample.decaf:3] ID "a" [pa02-sample.decaf:4] SYM "=" [pa02-sample.decaf:4] DEC "4" [pa02-sample.decaf:4] SYM "+" [pa02-sample.decaf:4] DEC "5" [pa02-sample.decaf:4] SYM ";" [pa02-sample.decaf:4] KEY "return" [pa02-sample.decaf:5] ID "a" [pa02-sample.decaf:5] SYM ";" [pa02-sample.decaf:5] SYM "}" [pa02-sample.decaf:6]
Reflection
One of the goals of this course is to encourage introspection and thoughtful, deliberate development processes. For this project, you will submit a short (2-3 paragraphs) reflection, answering any (or all) of the following questions as appropriate:
- How does this project fit into the overall picture of our semester-long compiler project? Why is it important?
- Describe your design and development process. Did you use a formal software development method?
- What aspects of this project proved to be the most rewarding?
- What aspects of this project proved to be the most challenging? How did you overcome these challenges?
- How do you know your submission is correct? Briefly describe your testing regimen.
- If you had to start the project again from scratch, what would you do differently?
- What concepts from our theoretical class material or techniques from our classroom activities did you apply in this project?
- Suppose you weren’t using any of the concepts or techniques we’ve covered in class this semester; how would your solution to this project be different?
- What other areas of computer science (or CS courses you’ve taken) impacted your solution to this project?
- Do you have any other feedback about this project?
Include as much detail as you can, but do not ramble. Concise answers are preferred over verbose ones. Your reflection will be graded on a five-point scale:
Insightful | 5 |
4 | |
Superficial | 3 |
2 | |
Deficient | 1 |
No submission | 0 |
Submission
DUE DATE: Friday, September 22, 23:59 EDT (11:59PM)
Please submit ONLY your MyDecafLexer.java
file and your
plain-text reflection to the appropriate assignments on Canvas. Additionally,
make sure the package edu.jmu.decaf; line is not modified.
Code Reviews
One of the goals of this course is to encourage good software development practices, especially when building a large software system (such as a compiler). For this project, you will be assigned two other random students in the course. You must review their code and offer constructive feedback according to the given rubric.
Submit your review on Canvas by Friday, September 29. Please submit your review as a comment (not an attached file).
Grading
Here is the grading breakdown for this project:
Automated tests | 80 |
Instructor review | 5 |
Reflection | 5 |
Peer review | 10 |
TOTAL | 100 |