2015-01-22
Class 4
Overview (Ch. 2)

lexer jobs
  - get rid of white space
  - collapse keywords and literals
    - numerical constants, strings, etc.

DISCUSS:
        enum type {
            ID, LIT, SYM
        }
        class Token {
            type
            union { str, int, char }
        }

    vs.

        enum token {
            ID, LIT, PLUS, MINUS, LPAREN, RPAREN, etc.
        }
        class Token {
            type
            union { str, int }
        }

    vs.

        class Token       { char }
        class Num < Token { int  }
        class Sym < Token { str  }
        

  key: being able to get lexer and parser to talk to each other
    - use whatever your compiler generator prefers

symbol tables
  - key info: identifier, type, scope, and location
    - scope often implicit via multiple tables
    - location often not determined until code generation (data allocation)
  - purpose: pass info from declaration -> use
  - real compilers need to use hash tables for quick lookups
    - need to track changes made in each scope
    - don't need this for our compiler
  - section 2.7 assumes static scoping
    - this is what we'll use for Decaf
  - example: tables for gcd program
  - example: Fig. 2.38
    - main point: we now have type information in the factor production
    - for our compiler, we will probably wait to build the symbol table
      until the analysis stage (for tighter coupling with type checking)

intermediate code generation
  - syntax trees and three-address code (TAC/3AC)


syntax trees
  - example: Fig. 2.39
    - your Decaf grammar will look like this
  - main point: we have a parse tree at the end of the process
    - no real need for declarations at this point
  - abstraction for expression
    - no abstraction: put everything in the grammar
      - one level per group w/ multiple "|" clauses per level
      - specific but tedious
    - all abstraction: a single "expr" level
      - specify precedence and associativity with directives
      - can't include a subset in some cases (i.e. rel ops in conditionals)
    - need to balance abstraction: none vs. all

static analysis
  - forbid multiple definitions of same ID within a single scope
  - forbid "break" and "continue" outside loops
  - verify l-values and r-values
  - type checking
    - types of lhs and rhs in assignments must match
    - operands must evaluate to appropriate types
    - actual parameter types must match formal parameters
    - return statements must match function return types and call site lhs
  - from Decaf document:
    - identifiers must be defined before use
    - program must contain a "main" method
    - int literals in array declarations must be >0
    - type checking
      - indexed identifiers must be arrays
      - conditional expressions must evaluate to a boolean
      - for-loop iteration bounds must evaluate to an integer

three-address code (3AC - Ch. 6.2.1)

    -  x = y op z               (binary operation)
    -  x = op y                 (unary operation)
    -  x = y                    (copy)
    -  x = y[i]                 (indexed copy)(rhs)
    -  x[i] = y                      "        (lhs)

    -  goto L                   (unconditional jump)
    -  if x goto L              (conditional jump)
    -  ifFalse x goto L              "            (negation)
    -  if x relop y goto L           "            (relational)

    -  param x                  (parameter passing)
    -  call f, n                (procedure call)
    -  x = call f, n            (function call)
    -  return x                 (function return)

    -  x = &y                   (get address)
    -  x = *y                   (address dereference)
    -  *x = y                   (copy to address)

       (we probably won't need the last three for Decaf)

  - example: Fig 2.42
  - need linear code emission mechanism
  - need to track labels
  - will want intermediate representation for 3AC
    - for optimizations and translation into x86 assembly

  - example: Fig. 2.44 and 2.45
    - work through "a[i] = 2 * a[j-k]"