How to build a numbered list parser in nom?

398 Views Asked by At

I'd like to parse a numbered list using nom in Rust.

For example, 1. Milk 2. Bread 3. Bacon.

I could use separated_list1 with an appropriate separator parser and element parser.

fn parser(input: &str) -> IResult<&str, Vec<&str>> {
    preceded(
        tag("1. "),
        separated_list1(
            tuple((tag(" "), digit1, tag(". "))),
            take_while(is_alphabetic),
        ),
    )(input)
}

However, this does not validate the increasing index numbers.

For example, it would happily parse invalid lists like 1. Milk 3. Bread 4. Bacon or 1. Milk 8. Bread 1. Bacon.

It seems there is no built-in nom parser that can do this. So I ventured to try to build my own first parser...

My idea was to implement a parser similar to separated_list1 but which keeps track of the index and passes it to the separator as argument. It could accept a closure as argument that can then create the separator parser based on the index argument.

fn parser(input: &str) -> IResult<&str, Vec<&str>> {
    preceded(
        tag("1. "),
        separated_list1(
            |index: i32| tuple((tag(" "), tag(&index.to_string()), tag(". "))),
            take_while(is_alphabetic),
        ),
    )(input)
}

I tried to use the implementation of separated_list1 and change the separator argument to G: FnOnce(i32) -> Parser<I, O2, E>,, create an index variable let mut index = 1;, pass it to sep(index) in the loop, and increase it at the end of the loop index += 1;.

However, Rust's type system is not happy!

How can I make this work?


Here's the full code for reproduction

use nom::{
    error::{ErrorKind, ParseError},
    Err, IResult, InputLength, Parser,
};

pub fn separated_numbered_list1<I, O, O2, E, F, G>(
    mut sep: G,
    mut f: F,
) -> impl FnMut(I) -> IResult<I, Vec<O>, E>
where
    I: Clone + InputLength,
    F: Parser<I, O, E>,
    G: FnOnce(i32) -> Parser<I, O2, E>,
    E: ParseError<I>,
{
    move |mut i: I| {
        let mut res = Vec::new();
        let mut index = 1;

        // Parse the first element
        match f.parse(i.clone()) {
            Err(e) => return Err(e),
            Ok((i1, o)) => {
                res.push(o);
                i = i1;
            }
        }

        loop {
            let len = i.input_len();
            match sep(index).parse(i.clone()) {
                Err(Err::Error(_)) => return Ok((i, res)),
                Err(e) => return Err(e),
                Ok((i1, _)) => {
                    // infinite loop check: the parser must always consume
                    if i1.input_len() == len {
                        return Err(Err::Error(E::from_error_kind(i1, ErrorKind::SeparatedList)));
                    }

                    match f.parse(i1.clone()) {
                        Err(Err::Error(_)) => return Ok((i, res)),
                        Err(e) => return Err(e),
                        Ok((i2, o)) => {
                            res.push(o);
                            i = i2;
                        }
                    }
                }
            }
            index += 1;
        }
    }
}
1

There are 1 best solutions below

2
MeetTitan On

Try to manually use many1(), separated_pair(), and verify()

fn validated(input: &str) -> IResult<&str, Vec<(u32, &str)>> {
    let current_index = Cell::new(1u32);
    let number = map_res(digit1, |s: &str| s.parse::<u32>());
    let valid = verify(number, |digit| {
        let i = current_index.get();
        if digit == &i {
            current_index.set(i + 1);
            true
        } else {
            false
        }
    });
    let pair = preceded(multispace0, separated_pair(valid, tag(". "), alpha1));
    //give current_index time to be used and dropped with a temporary binding. This will not compile without the temporary binding 
    let tmp = many1(pair)(input);
    tmp
}

#[test]
fn test_success() {
    let input = "1. Milk 2. Bread 3. Bacon";
    assert_eq!(validated(input), Ok(("", vec![(1, "Milk"), (2, "Bread"), (3, "Bacon")])));
}

#[test]
fn test_fail() {
    let input = "2. Bread 3. Bacon 1. Milk";
    validated(input).unwrap_err();
}