I've generally used immutable value types when writing java code. Sometimes it's been through libraries (Immutables, AutoValue, Lombok), but mostly just vanilla java classes with:
- all
finalfields - a constructor with all fields as parameters
(This question is for java 11 and below, given current spark support).
In Spark Sql, data types require an Encoder. Using off-the-shelf encoders like Encoder.bean(MyType.class), using such an immutable data type results in "illegal reflective access operation".
I'm curious what the spark sql (dataset) approach is here. Obviously I could relax this and make it a mutable pojo.
Update
Looking into the code for Encoders.bean it really does have to be a classic, mutable POJO. The reflection code looks for appropriate setters. Further (and this is documented) the only supported collection types are array, list and map (not set).
This was actually a misdiagnosis. The immutability of my data type was not causing the reflective access issues. It was a JVM 11+ issue (mostly noted here) https://github.com/renaissance-benchmarks/renaissance/issues/241
By adding the following JVM arguments everything is working correctly: