I am trying this code but need generic implementation of deleting duplicates in dataframe:
import pandas as pd
# making data frame from csv file
data = pd.read_csv("C:/Users/gvsph/Downloads/employees.csv")
# sorting by first name
data.sort_values("First Name", inplace=True)
# dropping ALL duplicte values
data.drop_duplicates(subset="First Name",
keep=False, inplace=True)
# displaying data
print(data)
You can use 'drop_duplicates' without parameters to remove all duplicate records from your dataset.
cfr pandas docs