I have an odd question I am hoping you can all help with. I have approximately 100,000 datapoints that have a year column and then an ID/name column. The problem is that when the data was entered each name was entered individually and manually so thousands of the names are spelt wrong. Please see below. Am I doomed to correct this manually or is there a script or something I can run to make it so the misspelled names fall in line with their correctly spelled counterparts. I will also mention there are cases where each name is spelt wrong (they are also all in Spanish.
| Year | ID and Name |
|---|---|
| 2019 | 65544322-josef |
| 2020 | 65544322-joseph |
| 2021 | 65544322-joseph |
| 2022 | 65544322-joseph |
| 2023 | 65544322-joseph |
| 2019 | 62234455-guayaquil |
| 2020 | 62234455-guayaquil |
| 2021 | 62234455-guayaquil |
| 2022 | 62234455-guayaquil |
| 2023 | 62234455-huayaquil |
What I have been doing so far is manually checking this info by grouping the ids by the number of times they are seen in the database and then going back and manually correcting the misspelled names but this is going to take forever.
| Pivot | Count of ID and Name |
|---|---|
| 62234455-guayaquil | 4 |
| 62234455-huayaquil | 1 |
| 65544322-josef | 1 |
| 65544322-joseph | 4 |
Any ideas?