I'm not very strong at statistics, but still never seen this before. uarch-bench, to calculate final measurement value for a test case, uses the following algorithm:
- find minimal value from the population (population is formed using measurements from the samples (there are 33 samples and 2 warmup samples which aren't included to calculation), that is we have 33 values);
- Next we normalize those values by dividing each of them by operations per loop inside a sample;
Note: there's one more step though which subtracts "bench" measurements and "base" measurements, but since I don't use "base" values -- it is just a dummy_bench that costs nothing, so I only have "bench" population.
So, the question is simple -- what is the purpose of using minimal (I think I can call it "the best value" as well) between samples? This seems to be unreliable and average value should show more stable results?