Varhandle indirect access slower than direct access

29 Views Asked by At

I have found an issue when indirect access is slower than direct access. It seems like inlining doesn't work in that case. In stack profiler I see different results.


@Fork(value = 1)
@BenchmarkMode(Mode.Throughput)
@OutputTimeUnit(TimeUnit.SECONDS)
public class DifferentAccessBenchmark {

    @State(Scope.Benchmark)
    public static class BenchState {

        public static final VarHandle DIRECT_VAR_HANDLE;
        public static final IntVarHandle INDIRECT_VAR_HANDLE;
        public static final FunctionMapping FUNCTION_MAPPING;

        static {
            try {
                DIRECT_VAR_HANDLE = MethodHandles.privateLookupIn(Employee.class, MethodHandles.lookup())
                    .findVarHandle(Employee.class, "department", int.class);
                INDIRECT_VAR_HANDLE = new IntVarHandle(DIRECT_VAR_HANDLE);
                FUNCTION_MAPPING = new FunctionMapping(employee -> employee.department);
            } catch (Throwable e) {
                throw new RuntimeException(e);
            }
        }

        Employee[] data;

        @Setup
        public void setup() {
            data = IntStream.range(1,10240)
                .mapToObj(Employee::new)
                .toArray(Employee[]::new);
        }
    }


    @Benchmark
    public long benchmark_direct_var_handle(BenchState state) {
        long acc = 0;

        for (int i = 0; i < state.data.length; i++) {
            acc +=  BenchState.DIRECT_VAR_HANDLE.get(state.data[i]);
        }

        return acc;
    }

    @Benchmark
    public long benchmark_indirect_var_handle(BenchState state) {
        long acc = 0;

        for (int i = 0; i < state.data.length; i++) {
            acc +=  (int) BenchState.INDIRECT_VAR_HANDLE.getInt(state.data[i]);
        }

        return acc;
    }

    @Benchmark
    public long benchmark_to_int_function(BenchState state) {
        long acc = 0;

        for (int i = 0; i < state.data.length; i++) {
            acc += BenchState.FUNCTION_MAPPING.getInt(state.data[i]);
        }

        return acc;
    }

    public final static class Employee {
        public Employee(final int department) {
            this.department = department;
        }

        public int department;
    }


    public static class IntVarHandle {

        private final VarHandle varHandle;

        public IntVarHandle(VarHandle varHandle) {
            this.varHandle = varHandle;
        }

        public int getInt(Employee o) {
            return (int) varHandle.get(o);
        }
    }

    public static class FunctionMapping {

        private final ToIntFunction<Employee> intHandle;

        public FunctionMapping(ToIntFunction<Employee> intHandle) {
            this.intHandle = intHandle;
        }

        public int getInt(Employee o) {
            return intHandle.applyAsInt(o);
        }
    }
}

The result is on Apple M1.

DifferentAccessBenchmark.benchmark_direct_var_handle    thrpt    5  164317.970 ±  347.999  ops/s
DifferentAccessBenchmark.benchmark_indirect_var_handle  thrpt    5    7350.975 ±  320.375  ops/s
DifferentAccessBenchmark.benchmark_to_int_function      thrpt    5  165102.382 ± 6989.192  ops/s

Stack profiler:

 Direct: 
 33.3%  50.0% <stack is empty, everything is filtered?>
 33.3%  49.9% DifferentAccessBenchmark.benchmark_direct_var_handle
  0.0%   0.1% jmh_generated.DifferentAccessBenchmark_benchmark_direct_var_handle_jmhTest.benchmark_direct_var_handle_thrpt_jmhStub
  0.0%   0.0% java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.isReleasable
  0.0%   0.0% java.lang.reflect.AccessibleObject.verifyAccess
  0.0%   0.0% org.openjdk.jmh.runner.InfraControlL2.announceWarmupReady
  
  Indirect: 
  
   33.3%  50.0% <stack is empty, everything is filtered?>
   32.8%  49.2% java.lang.invoke.VarHandleGuards.guard_L_I
    0.5%   0.8% DifferentAccessBenchmark.benchmark_indirect_var_handle
    0.0%   0.0% java.lang.invoke.VarHandleGuards.guard_LLL_Z
    0.0%   0.0% org.openjdk.jmh.results.RawResults.<init>
    0.0%   0.0% java.lang.invoke.MethodHandle.maybeCustomize
    0.0%   0.0% java.lang.Object.hashCode

The indirect version is very slow... Can you help me investigate why? In perfasm seems like inlining doesn't work properly.

0

There are 0 best solutions below