I have found an issue when indirect access is slower than direct access. It seems like inlining doesn't work in that case. In stack profiler I see different results.
@Fork(value = 1)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
public class DifferentAccessBenchmark {
@State(Scope.Benchmark)
public static class BenchState {
public static final VarHandle DIRECT_VAR_HANDLE;
public static final IntVarHandle INDIRECT_VAR_HANDLE;
public static final FunctionMapping FUNCTION_MAPPING;
static {
try {
DIRECT_VAR_HANDLE = MethodHandles.privateLookupIn(Employee.class, MethodHandles.lookup())
.findVarHandle(Employee.class, "department", int.class);
INDIRECT_VAR_HANDLE = new IntVarHandle(DIRECT_VAR_HANDLE);
FUNCTION_MAPPING = new FunctionMapping(employee -> employee.department);
} catch (Throwable e) {
throw new RuntimeException(e);
}
}
Employee[] data;
@Setup
public void setup() {
data = IntStream.range(1,10240)
.mapToObj(Employee::new)
.toArray(Employee[]::new);
}
}
@Benchmark
public long benchmark_direct_var_handle(BenchState state) {
long acc = 0;
for (int i = 0; i < state.data.length; i++) {
acc += BenchState.DIRECT_VAR_HANDLE.get(state.data[i]);
}
return acc;
}
@Benchmark
public long benchmark_indirect_var_handle(BenchState state) {
long acc = 0;
for (int i = 0; i < state.data.length; i++) {
acc += (int) BenchState.INDIRECT_VAR_HANDLE.getInt(state.data[i]);
}
return acc;
}
@Benchmark
public long benchmark_to_int_function(BenchState state) {
long acc = 0;
for (int i = 0; i < state.data.length; i++) {
acc += BenchState.FUNCTION_MAPPING.getInt(state.data[i]);
}
return acc;
}
public final static class Employee {
public Employee(final int department) {
this.department = department;
}
public int department;
}
public static class IntVarHandle {
private final VarHandle varHandle;
public IntVarHandle(VarHandle varHandle) {
this.varHandle = varHandle;
}
public int getInt(Employee o) {
return (int) varHandle.get(o);
}
}
public static class FunctionMapping {
private final ToIntFunction<Employee> intHandle;
public FunctionMapping(ToIntFunction<Employee> intHandle) {
this.intHandle = intHandle;
}
public int getInt(Employee o) {
return intHandle.applyAsInt(o);
}
}
}
The result is on Apple M1.
DifferentAccessBenchmark.benchmark_direct_var_handle thrpt 5 164317.970 ± 347.999 ops/s
DifferentAccessBenchmark.benchmark_indirect_var_handle thrpt 5 7350.975 ± 320.375 ops/s
DifferentAccessBenchmark.benchmark_to_int_function thrpt 5 165102.382 ± 6989.192 ops/s
Stack profiler:
Direct:
33.3% 50.0% <stack is empty, everything is filtered?>
33.3% 49.9% DifferentAccessBenchmark.benchmark_direct_var_handle
0.0% 0.1% jmh_generated.DifferentAccessBenchmark_benchmark_direct_var_handle_jmhTest.benchmark_direct_var_handle_thrpt_jmhStub
0.0% 0.0% java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.isReleasable
0.0% 0.0% java.lang.reflect.AccessibleObject.verifyAccess
0.0% 0.0% org.openjdk.jmh.runner.InfraControlL2.announceWarmupReady
Indirect:
33.3% 50.0% <stack is empty, everything is filtered?>
32.8% 49.2% java.lang.invoke.VarHandleGuards.guard_L_I
0.5% 0.8% DifferentAccessBenchmark.benchmark_indirect_var_handle
0.0% 0.0% java.lang.invoke.VarHandleGuards.guard_LLL_Z
0.0% 0.0% org.openjdk.jmh.results.RawResults.<init>
0.0% 0.0% java.lang.invoke.MethodHandle.maybeCustomize
0.0% 0.0% java.lang.Object.hashCode
The indirect version is very slow... Can you help me investigate why? In perfasm seems like inlining doesn't work properly.