What's the optimization rule of rust vectorization?

287 Views Asked by At

I have some code snippets, some of those are auto-vectorized while some are not. It's confusing.

I test them on my macbook pro m1, and execute command cargo rustc --bin rust-playground -- -C opt-level=3 --emit=llvm-ir to get the llvm-ir and analyze whether it's auto-vectorized.

First 3 of them have the same caller code:

fn main() {
    let a = [
        1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6,
        7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4,
        5, 6, 7, 8,
    ];
    println!("{:?}", auto_vec(&a));
}
  1. not auto-vectorized
pub fn auto_vec(a: &[i32]) -> i32 {
    let mut p = 0;
    for i in a {
        p += i;
    }
    p
}
  1. auto-vectorized
pub fn auto_vec(a: &[i32]) -> i32 {
    let mut p = 0;
    let mut q = 0;
    // can see some instructions like 'add <4 x i32>' and 'mul <4 x i32>'
    for i in a {
        p += i;
        q += i * i;
    }
    q - p
}
  1. not auto-vectorized
pub fn auto_vec(a: &[i32]) -> i32 {
    let mut p = 0;
    let mut q = 0;
    // can't see some instructions like 'add <4 x i32>' or 'mul <4 x i32>'
    for i in a {
        p += i;
        q += i * i;
    }
    // do not access p and q
    0
}
  1. merge the auto_vec function into main, partial auto-vectorized
fn main() {
    // can see store <4 x i32>
    let a = [
        1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6,
        7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4,
        5, 6, 7, 8,
    ];
    let mut p = 0;
    let mut q = 0;
    // can't see add <4 x i32> or mul <4 x i32>
    for i in a {
        p += i; 
        q += i * i;
    }
    println!("{:?}", q - p);
}
  1. and I find that the length of the array may affect the result -- if it's too short, it seems that the compiler won't optimize it.

I'm thankful if someone helps me.

0

There are 0 best solutions below