- LitD:Kn65.pdf, Kn65↗ ⊥
- Donald E. Knuth
- 1965
- On the Translation of Languages from Left to Right
- INFORMATION AND CONTROL 8, 607-639 (1965)
- my python implementation
LR(k) Grammatik, Parsing und grundlegenden Eigenschaften
LR(k)
- L = Left to right parsing
- R = Reverse, Rightmost Derivation (Auflösung von Rules, Tree Linearisierung: immer rechtestes NonTerminal auflösen)
- k Lookahead Terminals for unique parsing
i Intro and Defs
- "intermediates" I (Rules): A, B, C
- "terminals" T: a, b, c
- S denotes the "principal intermediate character"
- ε empty string
- production is a relation A → Θ mit Θ Strings über I ∪ T
- Grammar G: {Ap → Xp1 ... Xpnp | rulenumber p ∈ [1,s], np >= 0 }
- φ → ψ direkte Ableitung: ein Intermediat aus φ ersetzen durch eine Produktion
- φ ⇒ ψ transitiver Abschluss von →
- φ ⇛ ψ transitiver+reflexiver Abschluss von → i.e. φ → ψ or φ = ψ
- language einer Grammatik: {α String über T | S ⇒ α}
- sentential form: any string α for which S ⇒ α
- derivation tree or parse diagram
- Linearisierungen indem ein Intermediate nach dem anderen expandiert wird, Reihenfolge ist irrelevant, also z.B. leftmost oder rightmost auswählen
- "handle of a tree" to be the leftmost set of adjacent leaves forming a complete branch
- "pruning off" handles (d.h. leaves der handle entfernen) ==> Schritt für Schritt zurück nach S <==> rightmost derivation in revers
- (n, p) is "handle of α = X1 ... Xn ... Xt (String over T ∪ I)" iff ∃ derivation tree von α mit der handle Xr+1 ... Xn for production p
- "k-sentential form" is a sentential form followed by k ⊣ with ⊣ ∉ T ∪ I
- a grammar is "LR(k)" iff for any k-sentiential
- iff any handle is always uniquely determined by the string to its left and the k terminal characters to its right. Formally: iff ∀
- α = X1 ... Xn ... Xn+k Y1 ... Yu is a k-sentential form without Intermediates from Xn+1 to Yu
- β = X1 ... Xn ... Xn+k Z1 ... Zv is a k-sentential form without Intermediates from Xn+1 to Zv
- (n, p) handle of α and (n', p') handle of β
- then n = n' and p = p'
- this means we do all possible reductions at n, and then step one forward - no backtracking needed
- remarks
- handle implies, no reductions left of n, however, we have to consider all possible trees
- the pushdown automata the infinit number of possible prefixes (for n unlimited) to a finite number of combinations
ii. ANALYSIS OF LR(k) GRAMMARS
method 1: reduce to regular Grammar
define Grammar G' from G by
- Terminals: T ∪ I ∪ {⊣}
- Intermediates: [A, α] with α ∈ (T ∪ ⊣)k and [p]
- Hk(σ) = {α | ∃ β: σ ⇒ αβ}
- rules
- [Ap, α] → Xp1 ... Xp(j-1) [Xpj, β] with j <= np and β = Hk(Xp(j+1) ... Xpnp α)
- [Ap, α] → Xp1 ... Xpnp α [p]
now, the following to are equivalent
- [S, ⊣k] ⇒ X1 ... Xn ... Xn+k [p] in G'
- there exists a k-sentential form of G: X1 ... Xn ... Xn+k Y1 ... Yu with handle (n, p) and Xn+1 ... Yu not intermediates
easy to see for Knuth, I don't see how to prove the left-most property of handle (may be is not necessary because it follows from the second property below?)
thus
- G is LR(k) if and only if
- [S, ⊣k] ⇒ θ [p] and [S, ⊣k] ⇒ θ φ [q] implies φ = ε and p = q in G'
but G' is regular and well known methods exist to check this
method 2: LR(k) pushdown parser
careful for emtpy productions A → ε! modify Hk(σ) to omit all derivations when an intermediate as the initial character is replaced by ε.
state = set of [p, j, α] meaning we have parsed β Xp1 ... Xpj and there is a sentential form β Ap α ...
stack: S0X1S1X2S2 ... XnSn | Y1 ... Yk ω: left from the | are the already parsed characters and states, right follows the lookahead und the rest of the input
algorithm on stack as above:
- step1 compute recursively closure S' of S' = Sn ∪ {[q, 0, β] | ∃ [p,j; α] in S', Xp(j+1) = q and β ∈ Hk(Xp(j+2) ... Xpnp α)} : add all productions that could newly start
- step2 compute
- Z = {β | ∃ [p, j; alpha;] in S', j < np and β ∈ Hk(Xp(j+1) ... Xpnp α)} : in rule p, but not at end
- Zp = {α | ∃ [p, np; α]} : at end of rule p
- if these sets are not disjoint, error: grammar is not LR(k)
- if lookahead Y1 ... Yk in Z then shift Y1 onto stack ==> S0X1S1X2S2 ... XnSnY1 | Y2...
- elif lookahead in Yp then reduce stack by rule p ==> S0X1S1X2S2 ... Xn-npSn-npAp | Y1 ...
- else syntax error
- step3 after renaming the stack has now the form S0X1S1X2S2 ... XnSnXn+1 | Y1 ... Yk ω
- compute S' from Sn as in step1 (add all productions that could newly start)
- compute Sn+1 = {[p, j+1, α] | [p, j; a] ∈ S' and Xpj = Xn+1}: new state after Xn+1
- if Sn+1 = {[0, 1, ⊢k]} and lookahead = ⊢k then parsing succesfully finished
- else push Sn+1 on the stack and go to step1