I have some code snippets, some of those are auto-vectorized while some are not. It's confusing.
I test them on my macbook pro m1, and execute command cargo rustc --bin rust-playground -- -C opt-level=3 --emit=llvm-ir to get the llvm-ir and analyze whether it's auto-vectorized.
First 3 of them have the same caller code:
fn main() {
let a = [
1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6,
7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4,
5, 6, 7, 8,
];
println!("{:?}", auto_vec(&a));
}
- not auto-vectorized
pub fn auto_vec(a: &[i32]) -> i32 {
let mut p = 0;
for i in a {
p += i;
}
p
}
- auto-vectorized
pub fn auto_vec(a: &[i32]) -> i32 {
let mut p = 0;
let mut q = 0;
// can see some instructions like 'add <4 x i32>' and 'mul <4 x i32>'
for i in a {
p += i;
q += i * i;
}
q - p
}
- not auto-vectorized
pub fn auto_vec(a: &[i32]) -> i32 {
let mut p = 0;
let mut q = 0;
// can't see some instructions like 'add <4 x i32>' or 'mul <4 x i32>'
for i in a {
p += i;
q += i * i;
}
// do not access p and q
0
}
- merge the
auto_vecfunction into main, partial auto-vectorized
fn main() {
// can see store <4 x i32>
let a = [
1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6,
7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4,
5, 6, 7, 8,
];
let mut p = 0;
let mut q = 0;
// can't see add <4 x i32> or mul <4 x i32>
for i in a {
p += i;
q += i * i;
}
println!("{:?}", q - p);
}
- and I find that the length of the array may affect the result -- if it's too short, it seems that the compiler won't optimize it.
I'm thankful if someone helps me.