I have a scripting language that I am implementing where every statement is terminated by a newline, except possibly the last one. It's also possible to have lines that are just newlines. Here's one implementation (statement code omitted since it doesn't really add anything):
@_('statement_set statement', 'statement_set')
def script(self, p):
if len(p) > 1:
p.statement_set.append(p.statement)
return p.statement_set
@_('statement_or_NL')
def statement_set(self, p):
return p.statement_or_NL
@_('statement_set statement_or_NL')
def statement_set(self, p):
p.statement_set.append(p.statement_or_NL)
return p.statement_set
@_('statement NEWLINE', 'NEWLINE')
def statement_or_NL(self, p):
if len(p) > 1:
return [p.statement]
else:
return []
@_('PLUS', 'MINUS')
def statement(self, p):
return p[0]
That generates the following rules:
Rule 0 S' -> script
Rule 1 script -> statement_set
Rule 2 script -> statement_set statement
Rule 3 statement_set -> statement_set statement_or_NL
Rule 4 statement_set -> statement_or_NL
Rule 5 statement_or_NL -> NEWLINE
Rule 6 statement_or_NL -> statement NEWLINE
If you'd like, I can post the generated states as well. The issue I'm running into is that, even when I eliminate all shift-reduce conflicts, I still almost always fail in one of the three scenarios of code that starts with a newline, code that ends with a newline, and code that has no newline at the end. Something goes awry in the parsing such that it will read the final statement (which ends with an EOF) as a "statement NEWLINE" and then error out because there is no new-line. Worse comes to worse, I have a version that works as long as there's no final statement without a newline, so I could probably rig the parser to either add it to the initial string with the code, or to insert the token, but that feels clunky.
I got this working in Ply previously, and I thought it was essentially the same code, but it does not seem to work for me.
Here's a fairly general solution for this sort of problem. It basically takes the statement list to be a two-state automaton, alternating between "ends-with-newline" and "ends-with-statement". The final result is a list of all statements (although you could do something more interesting with the newlines as well).
Since all newlines are ignored except for their role in separating statements, you can have any number of consecutive newlines, and newline(s) at the end of the input are optional. The simple transition diagram:
Finite automaton are easily converted into CFGs using the template:
That leads to the following (in which the actual definition of
statementis omitted, since the only thing that matters about it is that it cannot derive anything which starts with a statement terminator, which would be unusual to say the least, and that it must not derive empty, which occasionally requires some adjustments):Or, using Sly's debugfile, the grammar: