Pandas: Replace missing values in testing set by the mean of each group from the training set

31 Views Asked by wjosielct At 24 September 2023 at 17:11

I want to replace the missing values in the "X" column of the testing set according to the average of each category of the "Class" column, but these averages must come from the training set.

train:

| Class | X   |
| ---   | --- |
| A     | 10  |
| A     | NaN |
| A     | 20  |
| B     | 15  |
| B     | 17  |
| B     | NaN |

test:

| Class | X   |
| ---   | --- |
| A     | 11  |
| A     | NaN |
| B     | 25  |
| B     | NaN |

The idea is to use the averages of each group in the training set to replace the corresponding missing values in the testing set. In this case, the mean values of the column X for each category in the training set are:

Mean of X for Class A: 15
Mean of X for Class B: 16

So, the final testing set should be transformed like this:

final_test:

| Class | X   |
| ---   | --- |
| A     | 11  |
| A     | 15  |
| B     | 25  |
| B     | 16  |

I used the groupby() function but then I don´t know how to take the grouped values of the training set to replace the missing values in the testing set.

Thanks a lot.

Original Q&A

There are 1 best solutions below

Andrej Kesely On 24 September 2023 at 17:15

Try:

df_test = (
    df_test.set_index("Class").fillna(df_train.groupby("Class").mean()).reset_index()
)
print(df_test)

Prints:

  Class     X
0     A  11.0
1     A  15.0
2     B  25.0
3     B  16.0

Pandas: Replace missing values in testing set by the mean of each group from the training set

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in GROUP-BY

Related Questions in IMPUTATION

Related Questions in FILLNA

Trending Questions

Popular # Hahtags

Popular Questions