Im trying to find a way to go through my RDD to produce a new one that wouldn't include re appearing longitude latitude pairing. However I can't seem to get distinct() to work on. I've tried .distinct(lambda station: (station[1], station[2])) but this doesn't seem to work. The RDD has station name, longitude, latitude below I have given example of sample input and desired output.
Input:
[["Station A",11.002,10.22],
["Station B",17.86,13.49],
["Station C",12.52,12.22],
["Station D",11.002,10.22]]
Output (station D removed since the position was same as station A):
[["Station A",11.002,10.22],
["Station B",17.86,13.49],
["Station C",12.52,12.22]]
As stated I have tried:
.distinct(lambda station: (station[1], station[2]))