I have a dataframe with 5 columns (a, b, c, d, e, f). I then have specific values for columns a, b and c and need to interpolate along the dataframe to get values for columns d and e as well.
As a simplified case I try to do e.g.:
a = np.array(random.sample(range(1, 1001), 100))
b = 2* a
c = 3* a
d = 4* a
e= 5* a
data = {'a': a,
'b': b,
'c': c,
'd': d,
'e': e}
df = pd.DataFrame(data)
to_interp = {'a': [200.0, 525.0],
'b': [400.0, 1050.0],
'c': [600.0, 1575.0]}
new_df = pd.DataFrame(to_interp)
df = pd.concat([df, new_df], ignore_index=True)
df.sort_values('a', inplace=True)
df['d'] = df['d'].interpolate(method='linear', limit_direction='forward', axis=0)
df['e'] = df['e'].interpolate(method='linear', limit_direction='forward', axis=0)
interpolated_values = df[df['a'].isin(to_interp['a'])][['a', 'b', 'c', 'd', 'e']].copy()
print(interpolated_values)
but for this simplified case I am getting
a, b, c, d, e =
200.0 , 400.0, 600.0 , 800.0 , 1000.0
525.0, 1050.0 ,1575.0 , 2106.0 , 2632.5
which doesn't look right when i look at the row defined by a=525.
I'm not sure what I'm doing wrong so any help would be appreciated.
Thank you!
You should fix
random.seed(0)and provide the exact expected output for clarity, but I imagine that you need to might want to interpolate relative toa(in which case, set it as index and usemethod='index'.Then,
mergeandcombine_first:Output (using
random.seed(0)to define the input):