2015-01-29 Class 6 Finite Automata (3.6-3.9) finite automata - state machines - definition: (Σ, Q, q_0, F, δ) Σ alphabet (set of input symbols) Q states q_0 initial (start) state F set of final states δ transition function: (Q, Σ) -> Q - transition diagrams (Fig. 3.24) - transition tables (Fig. 3.25) - deterministic vs. non-deterministic - NFA definition: DFA w/ multiple paths and ε-transitions δ: (Q, (Σ ∪ {ε})) -> [Q] - ε-closure(s): all states reachable from s via ε-transitions see Fig. 3.33 - purpose: "recognize" sentences in regular languages DFA algorithm: s := q_0 for each input c: s := δ(s,c) return s ∈ F NFA algorithm: S := ε-closure(q_0) for each input c: T := {} for each s in S: T := T ∪ ε-closure(δ(s,c)) S = T return |S ∩ F| > 0 equivalence of regexes, NFAs, and DFAs - transitions: RE -> NFA -> DFA -> RE - RE -> NFA: inductive construction (Alg. 3.23) Fig. 3.40, 3.41, 3.42 (add new states and ε-transitions) constant size increase at each level - NFA -> DFA: subset construction (Alg. 3.20) Fig. 3.32, 3.33 (simulate all possible NFA executions) possible exponential increase in # of states power set of NFA states: 2^|Q| - DFA -> RE: informal concept (gradually remove states & label transitions with REs) efficiencies - Fig. 3.48 - NFAs: quicker to build, slower to accept constant expense as regex grows simulation must track list of possible states (better for one-off applications like grep) - DFAs: slower to build, faster to accept potentially expensive conversion phase only one possible state at any given time (better for lexical analysis) generating lexical analysis routines - combine NFAs for all individual patterns - add new start state with ε-transitions to start of each NFA - heuristics for resolving conflicts - choose longest prefix that matches some pattern - choose pattern that appears first in the specification - approach 1: use NFA directly - scan until next state set (T) is empty - look backwards through path to find a state set with a final state - approach 2: convert to DFA first - scan until next state is a dead state - look backwards through path to find a final state - handling lookahead constructs - scan past the pattern until the lookahead context is matched - backtrack and only accept pattern optimizations - RE -> DFA direct conversion (Alg. 3.36 & Fig. 3.62) - parse RE into syntax tree - calculate functions for leaf nodess - nullable, firstpos, lastpos, followpos - use these functions to build DFA - start with state representing firstpos(n_0) - add transitions on each input in Σ (and destination states if necessary) - new states represent unions of followpos(p) - for each p represented in the current state that represents a RE syntax node with the desired input - DFA minimization - basic idea: collapse states that have identical behaviors - create initial partitions containing non-final and final states - repeatedly split partitions if there is an input that "distinguishes" between states in the partition - DFA transition table compaction - eliminate many duplicate values ("default" table) - de-duplicate entries for similar states ("next"/"check" tables)