Trying to find the optimal way to combine slices into a database key.
I need a slice that contains three things (concatenated):
- A prefix byte (u8)
- A slice that is passed in of a compile-time known size (20 bytes)
- A slice that is passed in of a compile-time known size (32 bytes)
I've tried benchmarking the following 3 functions. I would have expected "slot_key_array" to be the fastest, since its not allocating a vector, but "slot_key_sized" is consistently the fastest. Why?
#[inline]
pub fn slot_key_vec(address: &[u8], slot: &[u8]) -> Vec<u8> {
let mut key: Vec<u8> = vec![DB_PREFIX_SLOT];
key.extend_from_slice(address);
key.extend_from_slice(slot);
key
}
#[inline]
pub fn slot_key_sized(address: &[u8], slot: &[u8]) -> Vec<u8> {
let mut key: Vec<u8> = Vec::with_capacity(1 + 20 + 32);
key.push(DB_PREFIX_SLOT);
key.extend_from_slice(address);
key.extend_from_slice(slot);
key
}
#[inline]
pub fn slot_key_array(address: &[u8], slot: &[u8]) -> [u8; 53] {
let key: [u8; 53] = [
DB_PREFIX_SLOT,
address[0], address[1], address[2], address[3], address[4], address[5], address[6], address[7],
address[8], address[9], address[10], address[11], address[12], address[13], address[14], address[15],
address[16], address[17], address[18], address[19],
slot[0], slot[1], slot[2], slot[3], slot[4], slot[5], slot[6], slot[7],
slot[8], slot[9], slot[10], slot[11], slot[12], slot[13], slot[14], slot[15],
slot[16], slot[17], slot[18], slot[19], slot[20], slot[21], slot[22], slot[23],
slot[24], slot[25], slot[26], slot[27], slot[28], slot[29], slot[30], slot[31],
];
key
}
You are probably on Linux, where the default allocator is good (I benchmarked on Windows, by replacing the allocator to
mimallocI got the same results).This is because the array versions need to do bounds checking for each element, which prevents vectorizing and adds overhead. But they don't have to: if we make the arguments arrays instead of slices, the bounds checking will be gone.
If you only have slices, you can convert them to arrays with
try_into():Another alternative is to use
copy_from_slice():Benchmark:
So, your version with arrays instead of slice is the fastest together with
try_into()(same performance), thencopy_from_slice(), and then the others.Yet another alternative is to help LLVM eliminate the bound checks. It has problems here because you index in increasing order: we check for index 0, but that means nothing about index 1, so we check it too, then 2, then 3... but if we check the last index first, LLVM knows that all previous indices also exist:
This brings this code on par with the arrays version: