2015-02-17
Parsing (Ch. 4)

CYK algorithm
  - dynamic programming
  - supports any context-free grammar in Chomsky normal form

Earley parser (Jay Earley, 1968)
  - dynamic programming
  - supports any context-free grammar
  - O(n^3)

LL: left-to-right, leftmost derivation ("top-down")
  - "recursive descent"

  - LL(k) (Lewis and Stearns, 1968)
    - k tokens of lookahead
    - very expensive (when introduced)

  - LL(1)
    - exactly one token lookahead
    - much more efficient
    - uses FIRST() and FOLLOW() to build parse tables

  - efficient LL(k) (Terence Parr, 1990)
    - k tokens of lookahead
    - k is gradually increased w/ backtracking as a failback
    - basis of original ANTLR

  - GLL (Bernard Lang, 1974 & Scott and Johnstone, 2010)
    - LL analogue to GLR
    - maintains multiple parsing descriptors on a stack

  - LL(*) (Terence Parr and Kathleen Fisher, 2011)
    - LL analogue to LAR(m)
    - regular lookaheads w/ cyclic DFAs

  - ALL(*) (Terence Parr et. al., 2014)
    - parallel regular lookaheads
    - dynamic; adapts to input sentences
    - simulates augmented recursive transition networks (ATNs)
    - basis of ANTLR 4

LR: left-to-right, rightmost derivation ("bottom-up")
  - "shift/reduce"

  - Canonical LR (Don Knuth, 1965)
    - allows duplicated states
    - fewer conflicts; much larger parse tables

  - SLR (Frank DeRemer)
    - simple" LR: LR(0) states and their transitions
    - FOLLOW() used to compute lookaheads

  - LALR (Frank DeRemer, 1969)
    - similar to SLR but w/ smaller lookaheads
    - equivalently, a simplified version of canonical LR
    - more complicated lookahead calculation
    - basis of yacc/bison

    - recursive ascent (Thomas Penello, 1986)

  - GLR (Masaru Tomita, 1984)
    - generalized LR(k): returns all valid parses
    - spawns parallel parsing processes on conflicts
    - graph-structured stack (GSS)
    - more efficient than Earley

  - LAR(m) (Bermudez and Schimpf, 1990)
    - regular lookaheads w/ cyclic DFAs

PEG: parsing expression grammar (Bryan Ford, 2004)
  - similar to CFG-based grammars, but alternation is inherently ordered
  - often implemented with "packrat" parsers that memoize partial results

Additional reading for next Tuesday (2/24):
  "Adaptive LL(*) Parsing: The Power of Dynamic Analysis"
  Terence Parr, Sam Harwell, and Kathleen Fisher (OOPSLA'14)
  http://dl.acm.org/citation.cfm?id=2660202
  PDF is posted on Canvas (see "Files" section)

Project 3: Decaf Parser
  - decaf program -> parse tree -> AST
    - ANTLR grammar for 1st transition
    - ANTLR visitor for 2nd transition
  - minor modifications to document
    - see website for modified grammar
  - common AST structures (DecafAST.py)
    - download from website
    - do NOT modify this file (may need to replace it)
  - significant hints/discussion on website
    - sample Decaf programs
  - canonical solution: <100 lines for Decaf.g4
                        <200 lines for DecafASTConverter.py