Compare rows in RDD and filter out

19 Views Asked by At

I have a sorted RDD, I have already applied some filters to it before. It’s not a key value pair.

I want to remove rows of the RDD. Given two consecutive rows, I would like to remove the second if some elements are the same in both.

I tried zip the RDD with itself, but what I get is pairs of the exact same row.

Could this be done with some group/reduce… ?

0

There are 0 best solutions below