Here is a simple but very common grammar rule case in EBNF format, the Statements is a none terminal symbol and Statement is none terminal symbol:
Statements ::= (Statement ';')*
After converting this rule to NFA, and then do the subset contruction for converting the NFA to DFA, and at last get the dfa:
State0 -> Statement -> State1 -> ';' ->State0
State0 -> ε -> State0
The State0 is the DFA's start state representing the none terminal symbol Statements, also it is the finish state.
From State0 input Statement and traslate to State1 and input ';' at State1, translate to State0.
Also, State0 could translate to self with the ε.
And after converting the above dfa to regular grammar following the algorithm in dragon book, i get the following grammar rules:
Statements -> ε
Statements -> Statement Extend_NT
Extend_NT -> ';' Statements
It added the new none terminal symbol Extend_NT, but i want to get the following the regular grammars which does not contain the extend symbol Extend_NT:
Statements -> ε
Statements -> Statement ';' Statements
So the question is that is there any algorithm could get the above result that does not contain the new none terminal symbol Extend_NT?
Or it is just a engineering problem?
I'm not really sure I understand your question.
If you just have a single production for a non-terminal and that production just has a single repetition operator, either at the beginning or the end, you can apply a simple desugaring: (Here
αandβare sequences of grammar symbols (but not EBNF symbols), andαmight be empty.)If
αis empty, you could use either of the above. For an LALR(1) parser generator, the left-recursive version would be the usual choice; if you're building an LL parser, you will of course want the right-recursive version.If there is more than one EBNF operator in the right-hand side, then you will need extra non-terminals, one for each EBNF repetition operator, except possibly one.
I don't know whether you'd call that an algorithm. I personally think of it as engineering, but the difference is not absolute.