I have a sorted RDD, I have already applied some filters to it before. It’s not a key value pair.
I want to remove rows of the RDD. Given two consecutive rows, I would like to remove the second if some elements are the same in both.
I tried zip the RDD with itself, but what I get is pairs of the exact same row.
Could this be done with some group/reduce… ?