I am struggling on how to create an instance of Functor[Dataset]... the problem is that when you map from A to B the Encoder[B] must be in the implicit scope but I am not sure how to do it.
implicit val datasetFunctor: Functor[Dataset] = new Functor[Dataset] {
override def map[A, B](fa: Dataset[A])(f: A => B): Dataset[B] = fa.map(f)
}
Of course this code is throwing a compilation error since Encoder[B] is not available but I can't add Encoder[B] as an implicit parameter because it would change the map method signature, how can I solve this?
You cannot apply
fright away, because you are missing theEncoder. The only obvious direct solution would be: takecatsand re-implement all the interfaces, adding an implictEncoderargument. I don't see any way to implement aFunctorforDatasetdirectly.However maybe the following substitute solution is good enough. What you could do is to create a wrapper for the dataset, which has a
mapmethod without the implicitEncoder, but additionally has a methodtoDataset, which needs theEncoderin the very end.For this wrapper, you could apply a construction which is very similar to the so-called
Coyoneda-construction (orCoyo? What do they call it today? I don't know...). It essentially is a way to implement a "free functor" for an arbitrary type constructor.Here is a sketch (it compiles with cats 1.0.1, replaced
Sparktraits by dummies):Now you can wrap a dataset
dsinto aMappedDataset(ds), thenmapit using the implicitMappedDatasetFunctoras long as you want, and then calltoDatasetin the very end, there you can supply a concreteEncoderfor the final result.Note that this will combine all functions inside
mapinto a single spark stage: it won't be able to save the intermediate results, because theEncoders for all intermediate steps are missing.I'm not quite there yet with studying
cats, I cannot guarantee that this is the most idiomatic solution. Probably there is somethingCoyoneda-esque already in the library.EDIT: There is Coyoneda in the cats library, but it requires a natural transformation
F ~> Gto a functorG. Unfortunately, we don't have aFunctorforDataset(that was the problem in the first place). What my implementation above does is: instead of aFunctor[G], it requires a single morphism of the (non-existent) natural transformation at a fixedX(this is what theEncoder[X]is).