I need a way to corre

32 Views Asked by At

I have an odd question I am hoping you can all help with. I have approximately 100,000 datapoints that have a year column and then an ID/name column. The problem is that when the data was entered each name was entered individually and manually so thousands of the names are spelt wrong. Please see below. Am I doomed to correct this manually or is there a script or something I can run to make it so the misspelled names fall in line with their correctly spelled counterparts. I will also mention there are cases where each name is spelt wrong (they are also all in Spanish.

Year ID and Name
2019 65544322-josef
2020 65544322-joseph
2021 65544322-joseph
2022 65544322-joseph
2023 65544322-joseph
2019 62234455-guayaquil
2020 62234455-guayaquil
2021 62234455-guayaquil
2022 62234455-guayaquil
2023 62234455-huayaquil

What I have been doing so far is manually checking this info by grouping the ids by the number of times they are seen in the database and then going back and manually correcting the misspelled names but this is going to take forever.

Pivot Count of ID and Name
62234455-guayaquil 4
62234455-huayaquil 1
65544322-josef 1
65544322-joseph 4

Any ideas?

0

There are 0 best solutions below