How to distinguish between minus sign and negative number in nom?

765 Views Asked by At

Using the parser generator nom, how can I write a parser which extracts the difference of the minus sign in the terms 1-2 and 1*-2 ?

In the first example, I expect the tokens 1, - and 2. In the second the "minus" sign specifies the number being negative. The expected tokens are 1, * and -2. Not 1, *, - and 2.

How can I make nom stateful, with user-defined states such as expect_literal: bool?

1

There are 1 best solutions below

0
Matthias On BEST ANSWER

The best solution I found for now is using nom_locate with a span defined as

use nom_locate::LocatedSpanEx;

#[derive(Clone, PartialEq, Debug)]
struct LexState {
    pub accept_literal: bool,
}

type Span<'a> = LocatedSpanEx<&'a str, LexState>;

Then you can modify the state via

fn set_accept_literal(
    value: bool,
    code: IResult<Span, TokenPayload>,
) -> IResult<Span, TokenPayload> {
    match code {
        Ok(mut span) => {
            span.0.extra.accept_literal = value;
            Ok(span)
        }
        _ => code,
    }
}

where TokenPayload is an enum representing my token content.

Now you can write the operator parser:

fn mathematical_operators(code: Span) -> IResult<Span, TokenPayload> {
    set_accept_literal(
        true,
        alt((
            map(tag("*"), |_| TokenPayload::Multiply),
            map(tag("/"), |_| TokenPayload::Divide),
            map(tag("+"), |_| TokenPayload::Add),
            map(tag("-"), |_| TokenPayload::Subtract),
            map(tag("%"), |_| TokenPayload::Remainder),
        ))(code),
    )
}

And the integer parser as:

fn parse_integer(code: Span) -> IResult<Span, TokenPayload> {
    let chars = "1234567890";
    // Sign ?
    let (code, sign) = opt(tag("-"))(code)?;
    let sign = sign.is_some();
    if sign && !code.extra.accept_literal {
        return Err(nom::Err::Error((code, ErrorKind::IsNot)));
    }
    let (code, slice) = take_while(move |c| chars.contains(c))(code)?;
    match slice.fragment.parse::<i32>() {
        Ok(value) => set_accept_literal(
            false,
            Ok((code, TokenPayload::Int32(if sign { -value } else { value }))),
        ),
        Err(_) => Err(nom::Err::Error((code, ErrorKind::Tag))),
    }
}

This might not win a beauty contest but it works. The remaining pieces should be trivial.