Which is the most idiomatic way to parse an i32 from ascii in Rust

77 Views Asked by At

I'm trying to find both the quickest and the most idiomatic way(s) to parse an i32 from a vector containing Ascii in Rust. I have three functions (minus error handling etc):

#![feature(test)]

extern crate test;

fn v_0() -> i32 {
    let read_buf: Vec<u8> = vec![54, 52, 52];
    let num: i32 = String::from_utf8(read_buf)
        .unwrap()
        .parse()
        .unwrap();

    assert_eq!(num, 644);
    num
}

fn v_1() -> i32 {
    let read_buf: Vec<u8> = vec![54, 52, 52];

    let num: i32 = read_buf.iter().rev().enumerate().map(|(idx, val)| {
        (val - 48) as i32 * 10.pow(idx) 
    }).sum();

    assert_eq!(num, 644);
    num
}

fn v_2() -> i32 {
    use atoi::atoi;

    let read_buf: Vec<u8> = vec![54, 52, 52];
    let num = atoi(read_buf[..].try_into().unwrap()).unwrap();

    assert_eq!(num, 644);
    num
}

#[cfg(test)]
mod tests {
    use test::Bencher;

    use crate::{v_0, v_1, v_2};

    #[bench]
    fn v_0_bench(b: &mut test::Bencher) {
        let n = test::black_box(1000);
        b.iter(v_0)
    }

    #[bench]
    fn v_1_bench(b: &mut test::Bencher) {
        let n = test::black_box(1000);
        b.iter(v_1)
    }

    #[bench]
    fn v_2_bench(b: &mut test::Bencher) {
        let n = test::black_box(1000);
        b.iter(v_2)
    }
}

And the benchmark results are as follows:

test tests::v_0_bench ... bench:          16 ns/iter (+/- 0)
test tests::v_1_bench ... bench:           0 ns/iter (+/- 0)
test tests::v_2_bench ... bench:          11 ns/iter (+/- 1)

I'm suspecting that the v_1 bench is getting optimized away, so as a question within a question, how do I prevent this?

Out of the 3 example functions, if any, which should one use and why?

1

There are 1 best solutions below

5
prog-fh On

Playing around this on my computer gives the following results.

test tests::v_0_bench ... bench:       1,126 ns/iter (+/- 27)
test tests::v_1_bench ... bench:         767 ns/iter (+/- 12)
test tests::v_2_bench ... bench:       1,728 ns/iter (+/- 4)
test tests::v_3_bench ... bench:         424 ns/iter (+/- 2)
test tests::v_4_bench ... bench:         438 ns/iter (+/- 2)

The buffer has been removed from the benchmark and black_box() is used in order to pretend the parameter is unpredictable and the result is used.
The sequence of bytes has been made longer in order to let the algorithms actually work, and the invocations are repeated many times in order to introduce bigger differences in the results.

Comparing v_0 and v_1 shows that a substantial part of the time is spent in checking the utf8 string.
v_2 is probably disappointing because of its pow() operation.
v_3 is clearly faster, but it does not check anything; the result could be meaningless because the bytes could contain anything but digits, and the resulting integer may also overflow if the sequence is too long!
v_4 is similar to v_3 but considers an eventual starting - sign (because an i32 is expected, not an u32); this first check does not seem to be too detrimental to performances.

If we are absolutely certain that the sequence of bytes is correct, then we could lean towards the fastest version (v_3 or v_4), but in any other situation we should definitely prefer v_0.

#![feature(test)] // cargo bench
extern crate test; // not in Cargo.toml

#[inline(always)]
fn v_0(buf: &[u8]) -> i32 {
    std::str::from_utf8(buf).unwrap().parse().unwrap()
}

#[inline(always)]
fn v_1(buf: &[u8]) -> i32 {
    unsafe { std::str::from_utf8_unchecked(buf) }
        .parse()
        .unwrap()
}

#[inline(always)]
fn v_2(buf: &[u8]) -> i32 {
    buf.iter()
        .rev()
        .enumerate()
        .map(|(idx, val)| (val - b'0') as i32 * 10_i32.pow(idx as u32))
        .sum()
}

#[inline(always)]
fn v_3(buf: &[u8]) -> i32 {
    buf.iter().fold(0, |acc, b| acc * 10 + (b - b'0') as i32)
}

#[inline(always)]
fn v_4(buf: &[u8]) -> i32 {
    let mut it = buf.iter();
    let b0 = *it.next().unwrap();
    let init = if b0 == b'-' { 0 } else { (b0 - b'0') as i32 };
    let num = it.fold(init, |acc, b| acc * 10 + (b - b'0') as i32);
    if b0 == b'-' {
        -num
    } else {
        num
    }
}

fn make_buf() -> Vec<u8> {
    b"1234567890".to_vec()
}

#[inline(always)]
fn run(
    mut f: impl FnMut(&[u8]) -> i32,
    buf: &[u8],
) {
    for _ in 0..100 {
        test::black_box(f(test::black_box(buf)));
    }
}

#[bench]
fn v_0_bench(b: &mut test::Bencher) {
    let buf = make_buf();
    b.iter(|| run(v_0, &buf))
}

#[bench]
fn v_1_bench(b: &mut test::Bencher) {
    let buf = make_buf();
    b.iter(|| run(v_1, &buf))
}

#[bench]
fn v_2_bench(b: &mut test::Bencher) {
    let buf = make_buf();
    b.iter(|| run(v_2, &buf))
}

#[bench]
fn v_3_bench(b: &mut test::Bencher) {
    let buf = make_buf();
    b.iter(|| run(v_3, &buf))
}

#[bench]
fn v_4_bench(b: &mut test::Bencher) {
    let buf = make_buf();
    b.iter(|| run(v_4, &buf))
}

fn main() {
    let buf = make_buf();
    println!("v_0 {}", v_0(&buf));
    println!("v_1 {}", v_1(&buf));
    println!("v_2 {}", v_2(&buf));
    println!("v_3 {}", v_3(&buf));
    println!("v_4 {}", v_4(&buf));
}
/*
v_0 1234567890
v_1 1234567890
v_2 1234567890
v_3 1234567890
v_4 1234567890
*/