FutureWarning when applying a condition on a pandas dataframe to filter an array

Question

FutureWarning when applying a condition on a pandas dataframe to filter an array

640 Views Asked by pow At 23 September 2017 at 03:20

I have applied PCA to an array of around 1000 observations but only want to keep the observation in the new array IF one of the features from the original array = something.

I have a numpy array df2 and a dataframe df. I want to find all rows in df2 where df.Position is CDM.

My actual data:

df2

[[ -6.00987823e+00   4.46585005e+00]
 [ -7.09055159e+00   1.89437600e+00]
 [ -5.91044431e+00  -1.97888707e+00]
 [ -4.85698965e+00  -1.09936724e+00]
 [ -4.01780368e-01  -2.57178392e+00]
 [ -2.97351215e+00  -3.15940358e+00]
 [ -4.27973589e+00   2.82707326e+00]
 [  3.95086576e+00   1.08281922e+00]
 [ -2.94075361e+00  -1.95544661e+00]
 [ -4.83788056e+00   2.32369496e+00]
 [ -5.00473716e+00  -3.37680552e-01]
 [ -4.88905829e+00  -1.55527476e+00]
 [ -3.38202709e+00  -1.04402867e+00]
 [ -2.14261510e+00  -5.30757477e-01]
 [  3.00813803e-01  -2.11010985e+00]
 [ -2.67824986e+00  -1.83303905e+00]
 [ -1.64547049e+00  -2.48056250e+00]
 [ -2.92550543e+00  -3.02363170e+00]
 [ -4.01116933e+00   2.90363840e+00]
 [ -1.04571206e+00   7.58064433e-01]
 [  2.34068739e-01  -2.33981296e+00]
 [  3.15597517e+00   1.09429188e+00]
 [ -3.83828970e+00   1.14195305e-01]
 [ -7.33794066e-01  -3.70152816e+00]
 [  8.21789967e-01  -4.77818413e-01]
 [ -3.29257688e+00  -1.61887349e+00]
 [ -4.24297171e+00   2.27187714e+00]
 [  1.45714199e+00  -3.56024788e+00]
 [  1.79855738e+00  -3.71818328e-01]
 [  3.68171085e-01  -3.52961707e+00]
 [  3.77585412e+00  -3.01627595e-01]
 [ -4.21740128e+00  -1.30913719e+00]
 [ -3.85041585e+00  -1.05515969e+00]
 [ -5.01752378e+00   4.67348167e-01]
 [  3.65943448e+00   9.21016483e-01]
 [  3.12159896e+00  -1.25707872e-01]
 [ -4.50219722e+00  -4.06752784e+00]
 [ -3.92172250e+00  -2.88567430e+00]
 [ -2.68908475e-01  -2.17506629e+00]
 [ -1.13728112e+00  -2.66843007e+00]
 [ -8.73467957e-01  -1.24389494e+00]
 [  3.21966300e+00  -1.35271239e-01]
 [ -4.31060796e+00  -1.90505910e+00]
 [  3.73904981e+00   7.70228802e-01]
 [  1.02646986e+00  -5.91828676e-01]
 [  8.43840480e-01  -1.49636218e+00]
 [  1.54065978e+00  -1.65086030e+00]
 [  2.96602068e+00  -7.41024474e-01]
 [  6.53636345e-01   3.04647288e-01]
 [  2.59236989e+00  -6.70435261e-02]
 [  2.00184665e-01  -1.55230314e+00]
 [ -7.29533092e-01  -2.73390749e+00]
 [ -2.93578745e+00  -2.18118257e+00]
 [ -4.37481195e+00   1.02701222e+00]
 [  1.00713302e+00  -1.39943282e+00]
...]


df

(simply playing position in football/soccer - FB, CB, CDM, CM, AM, FW)

Position
FW
FW
FW
FW
FB
AM
FW
CB
AM
FW
AM
FW
AM
CM
FB
AM
CM
CM
FW
CM
CDM
CB
AM
FB
CDM
FW
FW
CDM
FB
CDM
CB
AM
...
AM

When filtering, I get this output (along with a FutureWarning):

Where am I going wrong and how can I filter the data appropriately?

Original Q&A

There are 2 best solutions below

jezrael On 23 September 2017 at 04:40

I think you need boolean indexing:

from sklearn.decomposition import PCA
import pandas as pd

d = {'d': [4, 5, 5],
     'a': [1, 2, 1], 
     'name': ['john', 'james', 'jane'], 
     'e': [5, 6, 7],
     'f': [6, 7, 8], 'c': [3, 4, 3], 
     'b': [2, 3, 2], 
     'colour': ['red', 'red', 'blue']}
cols = ['name', 'colour', 'a', 'b', 'c', 'd', 'e', 'f']
df = pd.DataFrame(d, columns = cols)
print (df)
    name colour  a  b  c  d  e  f
0   john    red  1  2  3  4  5  6
1  james    red  2  3  4  5  6  7
2   jane   blue  1  2  3  5  7  8

#create mask by condition
mask = df['colour'] == 'red'
#for multiple values
#mask = df['colour'].isin(['red', 'green', 'blue'])
print (mask)
0     True
1     True
2    False
Name: colour, dtype: bool

#filter only numeric values and convert to numpy array
arr = df.drop(['name','colour'], axis=1).values
print (arr)
[[1 2 3 4 5 6]
 [2 3 4 5 6 7]
 [1 2 3 5 7 8]]

pca = PCA(n_components=5)
pca.fit(arr)
print (pca.components_ )
[[-0.0463861  -0.0463861  -0.0463861  -0.35279184 -0.65919758 -0.65919758]
 [ 0.55515147  0.55515147  0.55515147  0.21897879 -0.11719389 -0.11719389]
 [ 0.62531284 -0.13184966 -0.136648   -0.71363037  0.17840759  0.17840759]]

#filter by condition
arr1 = pca.components_ [mask]
print (arr1)
[[-0.0463861  -0.0463861  -0.0463861  -0.35279184 -0.65919758 -0.65919758]
 [ 0.55515147  0.55515147  0.55515147  0.21897879 -0.11719389 -0.11719389]]

**this be Shiva** · Accepted Answer · 2017-09-23T04:36:27.270000

The FutureWarning is probably a result of your numpy and pandas versions being out of date. You can upgrade them using:

pip install --upgrade numpy pandas

As for the filtering, there are quite a few options. Here I mention each one with some dummy data.

Setup

df
    name colour  a  b  c  d  e  f
0   john    red  1  2  3  4  5  6
1  james    red  2  3  4  5  6  7
2   jane   blue  1  2  3  5  7  8

df2
       0      1
0  0.122  0.222
1  0.343  0.345
2  0.345  0.563

Option 1
boolean indexing

df2[df.colour == 'red']
Out[726]: 
       0       1
0  0.122   0.222
1  0.343   0.345

Option 2
df.eval

df2[df.eval('colour == "red"')]
Out[732]: 
       0       1
0  0.122   0.222
1  0.343   0.345

Note that both these options work even if df2 is a numpy array of the form:

array([[ 0.122,  0.222],
       [ 0.343,  0.345],
       [ 0.345,  0.563]])

For your actual data, you'll need to do something along the same lines:

df2

array([[-6.01 ,  4.466],
       [-7.091,  1.894],
       [-5.91 , -1.979],
       [-4.857, -1.099],
       [-0.402, -2.572],
       [-2.974, -3.159],
       [-4.28 ,  2.827],
       [ 3.951,  1.083],
       [-2.941, -1.955],
       [-4.838,  2.324],
       [-5.005, -0.338],
       [-4.889, -1.555],
       [-3.382, -1.044],
       [-2.143, -0.531],
       [ 0.301, -2.11 ],
       [-2.678, -1.833],
       [-1.645, -2.481],
       [-2.926, -3.024],
       [-4.011,  2.904],
       [-1.046,  0.758],
       [ 0.234, -2.34 ],
       [ 3.156,  1.094],
       [-3.838,  0.114],
       [-0.734, -3.702],
       [ 0.822, -0.478],
       [-3.293, -1.619],
       [-4.243,  2.272],
       [ 1.457, -3.56 ],
       [ 1.799, -0.372],
       [ 0.368, -3.53 ],
       [ 3.776, -0.302],
       [-4.217, -1.309]])

df

   Position
0        FW
1        FW
2        FW
3        FW
4        FB
5        AM
6        FW
7        CB
8        AM
9        FW
10       AM
11       FW
12       AM
13       CM
14       FB
15       AM
16       CM
17       CM
18       FW
19       CM
20      CDM
21       CB
22       AM
23       FB
24      CDM
25       FW
26       FW
27      CDM
28       FB
29      CDM
30       CB
31       AM

df2[df.Position == 'CDM']

array([[ 0.234, -2.34 ],
       [ 0.822, -0.478],
       [ 1.457, -3.56 ],
       [ 0.368, -3.53 ]])

FutureWarning when applying a condition on a pandas dataframe to filter an array

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in ARRAYS

Related Questions in PANDAS

Related Questions in NUMPY

Related Questions in FUTURE-WARNING

Trending Questions

Popular # Hahtags

Popular Questions