I have a column (name of column: KI_2783023 (Please enter supplier B-BBEE certificate expiry date:) in my SAP_Supplier_Yes data frame that has the data as below format.
KI_2783023 (Please enter supplier B-BBEE certificate expiry date:)
0 Thu, 7 Jul, 2022
4 Thu, 22 Jul, 2021
5 NaN
6 Tue, 31 May, 2022
7 Thu, 5 Aug, 2021
... ...
10431 NaN
10434 NaN
10437 NaN
10438 NaN
10440 NaN
I have also created a data dictionary:
Dt_Mnth_Dict = {'Jan': '01', 'Feb': '02', 'Mar': '03','Apr': '04', 'May': '05', 'Jun': '06','Jul': '07', 'Aug': '08', 'Sep': '09', 'Oct': '10', 'Nov': '11', 'Dec': '12'}
I have also created new columns in the df to pick out some contents of the data from the above-mentioned column as follows:
#creating columns [Date] from sliced data string from the SAP_Supplier_Yes DataFrame:
SAP_Supplier_Yes['Date_of_Exp_Date'] = SAP_Supplier_Yes['KI_2783023 (Please enter supplier B-BBEE certificate expiry date:)'].str[5:7]
#creating columns [Year] from sliced data string from the SAP_Supplier_Yes DataFrame
SAP_Supplier_Yes['Year_of_Exp_Date'] = SAP_Supplier_Yes['KI_2783023 (Please enter supplier B-BBEE certificate expiry date:)'].str[-4:]
#slicing the data to pick out the month
SAP_Supplier_Yes['Month_of_Exp_Date'] = SAP_Supplier_Yes['KI_2783023 (Please enter supplier B-BBEE certificate expiry date:)'].str[7:11]
#removing a comma character from the Month column
SAP_Supplier_Yes['Month_of_Exp_Date_'] = SAP_Supplier_Yes['Month_of_Exp_Date'].str.replace(',', '')
#using the Dt_Mnth_Dict dictionary to replace the Month 'word' with the number as per the dictionary
SAP_Supplier_Yes['Month_of_Exp_Date_'].map(Dt_Mnth_Dict) # note: if the dictionary does not exhaustively map all and entries then non-matched entries are changed to NaNs
#creating the FINAL date field (concatenating all three created above)
SAP_Supplier_Yes['KI_2783023 (Please enter supplier B-BBEE certificate expiry date:)_'] = SAP_Supplier_Yes['Date_of_Exp_Date'] + SAP_Supplier_Yes['Month_of_Exp_Date_'] + SAP_Supplier_Yes['Year_of_Exp_Date']
When I view the 'new' column, SAP_Supplier_Yes['KI_2783023 (Please enter supplier B-BBEE certificate expiry date:)_']:, I get the dates in this format:
0 7 Jul2022
4 22 Jul2021
5 NaN
6 31 May2022
7 5 Aug2021
...
10431 NaN
10434 NaN
10437 NaN
10438 NaN
10440 NaN
Instead of the following format
0 7 072022
4 22 072021
5 NaN
6 31 042022
7 5 082021
...
10431 NaN
10434 NaN
10437 NaN
10438 NaN
10440 NaN
Also, I still want to change this field to a Date field with the format 'dd-mm-yyyy', as my Data Quality business rules are based on this column and related calculations?
Thanking you in advance
I have tried the following and it worked fine (although it gave me a red Python message). Please see my solution below:
#created a date column #SAP_Supplier_Yes['Date_of_Exp_Date'] = SAP_Supplier_Yes['date_field'] = SAP_Supplier_Yes['KI_2783023 (Please enter supplier B-BBEE certificate expiry date:)'].str[5:7] #removing the space in the field SAP_Supplier_Yes['date_field_'] = SAP_Supplier_Yes['date_field'].str.replace(' ', '')
#created a 'Month' column SAP_Supplier_Yes['Month_of_Exp_Date'] = SAP_Supplier_Yes['KI_2783023 (Please enter supplier B-BBEE certificate expiry date:)'].str[7:11] #created another column to remove the [special character] comma and replacing it with nothing SAP_Supplier_Yes['Month_of_Exp_Date_'] = SAP_Supplier_Yes['Month_of_Exp_Date'].str.replace(',', '') #created another column to remove any spaces SAP_Supplier_Yes['Month_ONLY'] = SAP_Supplier_Yes['Month_of_Exp_Date_'].str.replace(' ', '')
#created a Year column SAP_Supplier_Yes['Year_of_Exp_Date'] = SAP_Supplier_Yes['KI_2783023 (Please enter supplier B-BBEE certificate expiry date:)'].str[-4:] #created another column to remove spaces SAP_Supplier_Yes['Year_of_Exp_Date'] = SAP_Supplier_Yes['Year_of_Exp_Date'].str.replace(' ', '')
#concatenated all three to create the FINAL field SAP_Supplier_Yes['FINAL_B-BBEE certificate expiry date'] = SAP_Supplier_Yes['date_field_'] + SAP_Supplier_Yes['Month_ONLY'] + SAP_Supplier_Yes['Year_of_Exp_Date']
#Creating another column to convert this field into a Date field/column SAP_Supplier_Yes['FINAL_B-BBEE_Certificate_Expiry_Date_'] = pd.to_datetime(SAP_Supplier_Yes['FINAL_B-BBEE certificate expiry date'])
Is there an shorter code I can use?