Resolving ambiguity in simple Instaparse grammar

116 Views Asked by At

[Also posted on the Instaparse mailing list, but posted here as well since I'm guessing this is a fairly general problem]

Consider the grammar

 D = (B|S)*
 S = 'S' B*
 B = 'B'

(This is Instaparse's version of BNF...)

B can occur by itself, or after S; if the latter, it should be considered part of the, er, S expression (no pun intended).

Example:

(-> "D = (B|S)*
     S = 'S' B*
     B = 'B'"
    parser
    (parses "BSBB"))

;;=>
([:D [:B "B"] [:S "S"] [:B "B"] [:B "B"]]
 [:D [:B "B"] [:S "S" [:B "B"] [:B "B"]]]    ;; <------
 [:D [:B "B"] [:S "S" [:B "B"]] [:B "B"]])

I'd like only the second result to match -- so that B gets included inside S when possible, and to remove the other options. What needs to be done to my parser to make this change?

More example expressions shown in this gist.

1

There are 1 best solutions below

0
Michał Marczyk On BEST ANSWER

You can use negative lookahead to postulate that matches of S must not be followed by valid Bs:

(-> "

D = (B|S)*
S = 'S' B* !B
B = 'B'

"
insta/parser
(insta/parses "BSBB"))
;= ([:D [:B "B"] [:S "S" [:B "B"] [:B "B"]]])

This works for all the examples in (the current version of) your gist as well.