Extract array value using index stored in pandas dataframe

80 Views Asked by At

I have a numpy array:

np.random.seed(123456)
a=np.random.randint(0,9, (5,5))

array([[1, 2, 1, 8, 0],
       [7, 4, 8, 4, 2],
       [6, 6, 7, 2, 6],
       [2, 4, 4, 7, 4],
       [4, 4, 5, 1, 7]])

and a pandas dataframe with the indices where I would like the data:

df = pd.DataFrame([[0,1],[1,2],[2,3]],columns=['i','j'])

In the past I used:

df['vals'] = df[['i', 'j']].apply(lambda x: a[x[0], x[1]], axis=1)

   i  j  vals
0  0  1     2
1  1  2     8
2  2  3     2

Which gives me the values I want (2, 8, 2) but has started giving me the following warning:

FutureWarning: Series.getitem treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use ser.iloc[pos]
df['vals'] = df[['i', 'j']].apply(lambda x: a[x[0], x1], axis=1)

As mentioned here tried using df.loc[:, 'vals'] = df.loc[:,['i', 'j']].apply(lambda x: d[x[0], x[1]], axis=1) but I still get the warning. Also as suggested I tried using df['vals'] = df[['i', 'j']].apply(lambda x: a[*x], axis=1) but I get a syntax error

Can anyone suggest a better way to do this? I don't like to silence warnings because I usually learn something from them.

1

There are 1 best solutions below

1
Timeless On

You're not doing what the linked Q/A suggests.

Using iloc or a starred expression actually turns-off the FutureWarning :

# Timeless
df["vals"] = df[["i", "j"]].apply(lambda x: a[x.iloc[0], x.iloc[1]], axis=1)

# Corralien
df["vals"] = df[["i", "j"]].apply(lambda x: a[(..., *x)], axis=1) # or a[(*x,)]

You can even stick with your approach and simply subtitute 0/1 with i/j :

df["vals"] = df[["i", "j"]].apply(lambda x: a[x["i"], x["j"]], axis=1)
# because `x` is a Series with length of 2 and i/j as indices

But FWIW, in your context, you should use numpy's advanced indexing :

# hpaulj
df["vals"] = a[df["i"], df["j"]]

Output :

   i  j  vals
0  0  1     2
1  1  2     8
2  2  3     2

[3 rows x 3 columns]