2015-01-22 Class 4 Overview (Ch. 2) lexer jobs - get rid of white space - collapse keywords and literals - numerical constants, strings, etc. DISCUSS: enum type { ID, LIT, SYM } class Token { type union { str, int, char } } vs. enum token { ID, LIT, PLUS, MINUS, LPAREN, RPAREN, etc. } class Token { type union { str, int } } vs. class Token { char } class Num < Token { int } class Sym < Token { str } key: being able to get lexer and parser to talk to each other - use whatever your compiler generator prefers symbol tables - key info: identifier, type, scope, and location - scope often implicit via multiple tables - location often not determined until code generation (data allocation) - purpose: pass info from declaration -> use - real compilers need to use hash tables for quick lookups - need to track changes made in each scope - don't need this for our compiler - section 2.7 assumes static scoping - this is what we'll use for Decaf - example: tables for gcd program - example: Fig. 2.38 - main point: we now have type information in the factor production - for our compiler, we will probably wait to build the symbol table until the analysis stage (for tighter coupling with type checking) intermediate code generation - syntax trees and three-address code (TAC/3AC) syntax trees - example: Fig. 2.39 - your Decaf grammar will look like this - main point: we have a parse tree at the end of the process - no real need for declarations at this point - abstraction for expression - no abstraction: put everything in the grammar - one level per group w/ multiple "|" clauses per level - specific but tedious - all abstraction: a single "expr" level - specify precedence and associativity with directives - can't include a subset in some cases (i.e. rel ops in conditionals) - need to balance abstraction: none vs. all static analysis - forbid multiple definitions of same ID within a single scope - forbid "break" and "continue" outside loops - verify l-values and r-values - type checking - types of lhs and rhs in assignments must match - operands must evaluate to appropriate types - actual parameter types must match formal parameters - return statements must match function return types and call site lhs - from Decaf document: - identifiers must be defined before use - program must contain a "main" method - int literals in array declarations must be >0 - type checking - indexed identifiers must be arrays - conditional expressions must evaluate to a boolean - for-loop iteration bounds must evaluate to an integer three-address code (3AC - Ch. 6.2.1) - x = y op z (binary operation) - x = op y (unary operation) - x = y (copy) - x = y[i] (indexed copy)(rhs) - x[i] = y " (lhs) - goto L (unconditional jump) - if x goto L (conditional jump) - ifFalse x goto L " (negation) - if x relop y goto L " (relational) - param x (parameter passing) - call f, n (procedure call) - x = call f, n (function call) - return x (function return) - x = &y (get address) - x = *y (address dereference) - *x = y (copy to address) (we probably won't need the last three for Decaf) - example: Fig 2.42 - need linear code emission mechanism - need to track labels - will want intermediate representation for 3AC - for optimizations and translation into x86 assembly - example: Fig. 2.44 and 2.45 - work through "a[i] = 2 * a[j-k]"