2015-02-17 Parsing (Ch. 4) CYK algorithm - dynamic programming - supports any context-free grammar in Chomsky normal form Earley parser (Jay Earley, 1968) - dynamic programming - supports any context-free grammar - O(n^3) LL: left-to-right, leftmost derivation ("top-down") - "recursive descent" - LL(k) (Lewis and Stearns, 1968) - k tokens of lookahead - very expensive (when introduced) - LL(1) - exactly one token lookahead - much more efficient - uses FIRST() and FOLLOW() to build parse tables - efficient LL(k) (Terence Parr, 1990) - k tokens of lookahead - k is gradually increased w/ backtracking as a failback - basis of original ANTLR - GLL (Bernard Lang, 1974 & Scott and Johnstone, 2010) - LL analogue to GLR - maintains multiple parsing descriptors on a stack - LL(*) (Terence Parr and Kathleen Fisher, 2011) - LL analogue to LAR(m) - regular lookaheads w/ cyclic DFAs - ALL(*) (Terence Parr et. al., 2014) - parallel regular lookaheads - dynamic; adapts to input sentences - simulates augmented recursive transition networks (ATNs) - basis of ANTLR 4 LR: left-to-right, rightmost derivation ("bottom-up") - "shift/reduce" - Canonical LR (Don Knuth, 1965) - allows duplicated states - fewer conflicts; much larger parse tables - SLR (Frank DeRemer) - simple" LR: LR(0) states and their transitions - FOLLOW() used to compute lookaheads - LALR (Frank DeRemer, 1969) - similar to SLR but w/ smaller lookaheads - equivalently, a simplified version of canonical LR - more complicated lookahead calculation - basis of yacc/bison - recursive ascent (Thomas Penello, 1986) - GLR (Masaru Tomita, 1984) - generalized LR(k): returns all valid parses - spawns parallel parsing processes on conflicts - graph-structured stack (GSS) - more efficient than Earley - LAR(m) (Bermudez and Schimpf, 1990) - regular lookaheads w/ cyclic DFAs PEG: parsing expression grammar (Bryan Ford, 2004) - similar to CFG-based grammars, but alternation is inherently ordered - often implemented with "packrat" parsers that memoize partial results Additional reading for next Tuesday (2/24): "Adaptive LL(*) Parsing: The Power of Dynamic Analysis" Terence Parr, Sam Harwell, and Kathleen Fisher (OOPSLA'14) http://dl.acm.org/citation.cfm?id=2660202 PDF is posted on Canvas (see "Files" section) Project 3: Decaf Parser - decaf program -> parse tree -> AST - ANTLR grammar for 1st transition - ANTLR visitor for 2nd transition - minor modifications to document - see website for modified grammar - common AST structures (DecafAST.py) - download from website - do NOT modify this file (may need to replace it) - significant hints/discussion on website - sample Decaf programs - canonical solution: <100 lines for Decaf.g4 <200 lines for DecafASTConverter.py