Disambiguating expressions in Lark syntax

Question

Disambiguating expressions in Lark syntax

43 Views Asked by Jonathon Reinhart At 05 February 2024 at 05:41

I have a (Lark) grammar that I think should be unambiguous, but depending on the version of Lark, fails to parse in one way or another:

import lark

syntax = r"""
stmt: mov_stmt
    | special_stmt

mov_stmt: reg ASSIGN (reg | const)
special_stmt: ("RS" SPECIAL_ASSIGN const)

reg: REG
const: DEC_NUM

REG.2: /R[0-7]|RS/
DEC_NUM: /0|[1-9]\d*/i

ASSIGN: "="
SPECIAL_ASSIGN: "&="

WS: /[ \t]+/
%ignore WS
"""

parser = lark.Lark(syntax, start="stmt", parser="lalr")

print(parser.parse("R3 = 7"))   # 1. ok
print(parser.parse("R3 = R7"))  # 2. ok
print(parser.parse("RS &= 1"))  # 3. Fails on lark==1.1.9; expected special_stmt
print(parser.parse("RS = R7"))  # 4. Fails on lark-parser==0.12.0; expected mov_stmt

With lark-parser==0.12.0, invocation number 4. fails. I expect a mov_stmt, but it is expecting a SPECIAL_ASSIGN token, meaning it is matching special_stmt.

lark.exceptions.UnexpectedToken: Unexpected token Token('ASSIGN', '=') at line 1, column 4.
Expected one of:
        * SPECIAL_ASSIGN
Previous tokens: [Token('RS', 'RS')]

With lark==1.1.9, the opposite happens and invocation number 3. fails. I expect a special_stmt, but it is expecting an ASSIGN token, meaning it is matching mov_stmt.

lark.exceptions.UnexpectedToken: Unexpected token Token('SPECIAL_ASSIGN', '&=') at line 1, column 4.
Expected one of:
        * ASSIGN
Previous tokens: [Token('REG', 'RS')]

In my mind, the grammar should be unambiguous. An = always means mov_stmt, and &= always means special_stmt (which only works for reg=RS).

How do I disambiguate this?

I tried assigning priorities to different terminals, to no effect.

Original Q&A

There are 1 best solutions below

**Jonathon Reinhart** · Accepted Answer · 2024-02-07T03:05:07.453000

This fixes the grammar:

stmt: mov_stmt
    | special_stmt

mov_stmt: reg ASSIGN (reg | const)
special_stmt: (special_reg SPECIAL_ASSIGN const)
//             ^^^^^^^^^^^

reg: REG | SPECIAL_REG      // <<<
special_reg: SPECIAL_REG    // <<<
const: DEC_NUM

REG.2: /R[0-7]/             // <<<
SPECIAL_REG.2: "RS"         // <<<

DEC_NUM: /0|[1-9]\d*/i

ASSIGN: "="
SPECIAL_ASSIGN: "&="

WS: /[ \t]+/
%ignore WS

It seems that when "RS" was part of the REG terminal regex, and special_stmt used a "RS" literal, there was ambiguity.

By defining a new SPECIAL_REG terminal, (and including it in the reg rule), the lexer is able to properly disambiguate the statements.

Disambiguating expressions in Lark syntax

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in EBNF

Related Questions in LARK-PARSER

Trending Questions

Popular # Hahtags

Popular Questions