I have the following dataframe:
gp_columns = {
'name': ['companyA', 'companyB'],
'firm_ID' : [1, 2],
'timestamp_one' : ['2016-04-01', '2017-09-01']
}
fund_columns = {
'firm_ID': [1, 1, 2, 2, 2],
'department_ID' : [10, 11, 20, 21, 22],
'timestamp_mult' : ['2015-01-01', '2016-03-01', '2016-10-01', '2017-02-01', '2018-11-01'],
'number' : [400, 500, 1000, 3000, 4000]
}
gp_df = pd.DataFrame(gp_columns)
fund_df = pd.DataFrame(fund_columns)
gp_df['timestamp_one'] = pd.to_datetime(gp_df['timestamp_one'])
fund_df['timestamp_mult'] = pd.to_datetime(fund_df['timestamp_mult'])
merged_df = gp_df.merge(fund_df)
merged_df
merged_df_v1 = merged_df.copy()
merged_df_v1['incidence_num'] = merged_df.groupby('firm_ID')['department_ID']\
.transform('cumcount')
merged_df_v1['incidence_num'] = merged_df_v1['incidence_num'] + 1
merged_df_v1['time_delta'] = merged_df_v1['timestamp_mult'] - merged_df_v1['timestamp_one']
merged_wide = pd.pivot(merged_df_v1, index = ['name','firm_ID', 'timestamp_one'], \
columns = 'incidence_num', \
values = ['department_ID', 'time_delta', 'timestamp_mult', 'number'])
merged_wide.reset_index()
My question is how i get a column that calculates the minimum time delta (so closest to 0). Note that the time delta can be negative or positive, so .abs() does not work for me here.


You can
stack(which removes NaTs) andgroupby.firstafter sorting the rows by absolute value (with thekeyparameter ofsort_values):output: