2015-02-17
Parsing (Ch. 4)
CYK algorithm
- dynamic programming
- supports any context-free grammar in Chomsky normal form
Earley parser (Jay Earley, 1968)
- dynamic programming
- supports any context-free grammar
- O(n^3)
LL: left-to-right, leftmost derivation ("top-down")
- "recursive descent"
- LL(k) (Lewis and Stearns, 1968)
- k tokens of lookahead
- very expensive (when introduced)
- LL(1)
- exactly one token lookahead
- much more efficient
- uses FIRST() and FOLLOW() to build parse tables
- efficient LL(k) (Terence Parr, 1990)
- k tokens of lookahead
- k is gradually increased w/ backtracking as a failback
- basis of original ANTLR
- GLL (Bernard Lang, 1974 & Scott and Johnstone, 2010)
- LL analogue to GLR
- maintains multiple parsing descriptors on a stack
- LL(*) (Terence Parr and Kathleen Fisher, 2011)
- LL analogue to LAR(m)
- regular lookaheads w/ cyclic DFAs
- ALL(*) (Terence Parr et. al., 2014)
- parallel regular lookaheads
- dynamic; adapts to input sentences
- simulates augmented recursive transition networks (ATNs)
- basis of ANTLR 4
LR: left-to-right, rightmost derivation ("bottom-up")
- "shift/reduce"
- Canonical LR (Don Knuth, 1965)
- allows duplicated states
- fewer conflicts; much larger parse tables
- SLR (Frank DeRemer)
- simple" LR: LR(0) states and their transitions
- FOLLOW() used to compute lookaheads
- LALR (Frank DeRemer, 1969)
- similar to SLR but w/ smaller lookaheads
- equivalently, a simplified version of canonical LR
- more complicated lookahead calculation
- basis of yacc/bison
- recursive ascent (Thomas Penello, 1986)
- GLR (Masaru Tomita, 1984)
- generalized LR(k): returns all valid parses
- spawns parallel parsing processes on conflicts
- graph-structured stack (GSS)
- more efficient than Earley
- LAR(m) (Bermudez and Schimpf, 1990)
- regular lookaheads w/ cyclic DFAs
PEG: parsing expression grammar (Bryan Ford, 2004)
- similar to CFG-based grammars, but alternation is inherently ordered
- often implemented with "packrat" parsers that memoize partial results
Additional reading for next Tuesday (2/24):
"Adaptive LL(*) Parsing: The Power of Dynamic Analysis"
Terence Parr, Sam Harwell, and Kathleen Fisher (OOPSLA'14)
http://dl.acm.org/citation.cfm?id=2660202
PDF is posted on Canvas (see "Files" section)
Project 3: Decaf Parser
- decaf program -> parse tree -> AST
- ANTLR grammar for 1st transition
- ANTLR visitor for 2nd transition
- minor modifications to document
- see website for modified grammar
- common AST structures (DecafAST.py)
- download from website
- do NOT modify this file (may need to replace it)
- significant hints/discussion on website
- sample Decaf programs
- canonical solution: <100 lines for Decaf.g4
<200 lines for DecafASTConverter.py