A few questions regarding serialization in Flink. In order to ensure Flink state to evolve safely and in a backwards compatible way, I decided to represent all non-primitive state types as Protobuf messages.
Question 1: are there any downsides to representing state with Protobufs? It looks like POJO might be faster / more efficient, but that performance hit is okay for my use case. Tradeoff is towards safety since there's better tooling available to ensure only compatible changes are made to Protobuf message schemas.
Question 2: is it sufficient to just using the interface type (eg. Message.class) when specifying serializers? Or, should I be adding/registering each concrete message type? Just the interface seems to work, but I'm not sure if there are gotchas here that I should consider.
Question 3: Does it matter how I specify ProtobufSerializer with Flink, addDefaultKryoSerializer or registerTypeWithKryoSerializer? I understand there's this mostly impacts Kryo's reference IDs, but I also understand the IDs are re-calculated within an object graph. So, do the IDs really matter in that case? Are there any situations where the same object graph would see classes in different orders?
Code sample of the various options described above:
javaEnv.addDefaultKryoSerializer(Message.class, ProtobufSerializer.class); // Option 1
javaEnv.registerTypeWithKryoSerializer(Message.class, ProtobufSerializer.class); // Option 2
javaEnv.addDefaultKryoSerializer(FooPb.class, ProtobufSerializer.class); // Option 3
javaEnv.registerTypeWithKryoSerializer(FooPb.class, ProtobufSerializer.class); // Option 4