How to filter NaN values using Pandas?

99 Views Asked by At

Imagine the following dataframe:

import pandas as pd
df = pd.DataFrame(columns = ['A','B','C'], index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'])
df.loc[['Monday', 'Friday'], ['A', 'C']] = 19

I want to print all columns and rows, that are not NaN. The output should be as follows:

A C
Monday 19 19
Friday 19 19

I have tried different methods, but none of them gives me the required result. I am doing something wrong here. Please help.

df = df[df.isna()]
df[(df.iloc[::]) & (df.isnull() == False)]
2

There are 2 best solutions below

1
Vikas Sharma On

Your code:

import pandas as pd
df = pd.DataFrame(columns = ['A','B','C'], index = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday'])
df.loc[['Monday', 'Friday'], ['A', 'C']] = 19
print(df)

Output:

             A    B    C
Monday      19  NaN   19
Tuesday    NaN  NaN  NaN
Wednesday  NaN  NaN  NaN
Thursday   NaN  NaN  NaN
Friday      19  NaN   19

Add this to your code:

df = df.stack().unstack()
print(df)

Output:

         A   C
Monday  19  19
Friday  19  19
0
mozway On

Assuming you want to remove rows/columns that are fully composed of NaNs use dropna on both axes:

out = df.dropna(how='all').dropna(how='all', axis=1)

Output:

         A   C
Monday  19  19
Friday  19  19

Timings

Comparison with df.stack().unstack() for a 1000x1000 DataFrame (50% NaNs):

# df.stack().unstack()
76 ms ± 3.73 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# df2.dropna(how='all').dropna(how='all', axis=1)
3.82 ms ± 84.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each