I'm using grako (a PEG parser generator library for python) to parse a simple declarative language where a document can contain one or more protocols.
Originally, I had the root rule for document written as:
document = {protocol}+ ;
This appropriately returns a list of protocols, but only gives helpful errors if a syntax error is in the first protocol. Otherwise, it silently discards the invalid protocol and everything after it.
I have also tried a few variations on:
document = protocol document | $ ;
But this doesn't result in a list if there's only one protocol, and doesn't give helpful error messages either, saying only no available options: (...) document if any of the protocols contains an error.
How do I write a rule that does both of the following?:
- Always returns a list, even if there's only one protocol
- Displays helpful error messages about the unsuccessful match, instead of just saying it's an invalid document or silently dropping the damaged protocol
This is the solution:
If you don't add the
$for the parser to see the end of file, the parse will succeed with one or more protocol, even if there are more to parse.Adding the cut expression (
~) makes the parser commit to what was parsed in the closest option/choice in the parse (a closure is an option ofX = a X|();). Additional cut expressions within what's parsed byprotocolwill make the error messages be closer to the expected points of failure in the input.