Record Linkage In Pyspark

481 Views Asked by At

How to achieve recordlinkage functionality in Pyspark ??? I want to do a similarity check between Dataset1 Name and Dataset 2 Name.

Please help suggest me if any library available for pyspark.

I try with the recordlinkage library of pyhton but it is working with pandas dataframe.

1

There are 1 best solutions below

0
Nick Crews On

Splink is the best option that I know of.