def find_valid_dates(dt):
result = re.findall("\d{1,2}-\d{2}-\d{2,4}|\d{1,2}\s(?:januari|februari|maart|april|mei|juni|juli|augustus|september|oktober|november|december)\s\d{1,4}", dt)
return result
SaaOne_msi_vervangen['valid_dates'] = SaaOne_msi_vervangen['Oplossingstekst'].apply(lambda dt : find_valid_dates(dt))
test = SaaOne_msi_vervangen['valid_dates'].date.apply(lambda str_list : [item.dt.strftime("%d-%m-%Y") for item in eval(str_list)])
I try to achieve the following:
- Convert the date elements of the list within the dataframe to the same format
- Remove duplicates elements in each list within the dataframe
Here is some data example:
SaaOne_msi_vervangen = pd.DataFrame({"valid_dates": [['8-10-2019', '08-10-2019', '08-10-2019', '09-10-2019', '09-10-2019', '09-10-19', '21-10-19', '23-10-19', '23-10-2019', '23-10-2019', '23-10-2019', '23-10-2019', '24-10-19', '23 oktober 2019', '23 oktober 2019', '23 oktober 2019'],['31-10-19', '19-11-01', '06-11-19', '29-11-2019', '03-12-19', '03-12-19', '5-12-2019', '04-12-19', '05-12-2019', '05-12-2019', '05-12-2019', '05-12-2019', '10-12-19', '5 december 2019']]})
You can try:
Prints:
To get back to a list form, then:
Prints: