how to perform box-cox transformation to single column in python

3.2k Views Asked by At

I am trying to apply a box-cox transformation to a single column but I am unable to do that. Can somebody help me with this issue?

from sklearn.datasets import fetch_california_housing
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.preprocessing import PowerTransformer

california_housing = fetch_california_housing(as_frame=True).frame
california_housing

power = PowerTransformer(method='box-cox', standardize=True)
california_housing['MedHouseVal']=power.fit_transform(california_housing['MedHouseVal'])
1

There are 1 best solutions below

4
Michael Hodel On

The function power.fit_transform requires the input data in case of a single feature to have shape (n, 1) instead of (n,) (where california_housing['MedHouseVal'] is of shape (n,), as it is a pd.Series). This can be achieved either by reshaping, i.e. by replacing

power.fit_transform(california_housing['MedHouseVal'])

with

power.fit_transform(california_housing['MedHouseVal'].to_numpy().reshape(-1, 1))

or, alternatively, and a bit more readable, by simply accessing a list of columns (which gives a pd.DataFrame) with california_housing[['MedHouseVal']] instead of a single column (which gives a pd.Series) with california_housing['MedHouseVal'], i.e. by using

power.fit_transform(california_housing[['MedHouseVal']])

Note that

print(california_housing['MedHouseVal'].shape)
print(california_housing[['MedHouseVal']].shape)

prints

(20640,)
(20640, 1)

An other option would be to use scipy.stats.boxcox:

from sklearn.datasets import fetch_california_housing
from scipy.stats import boxcox

california_housing = fetch_california_housing(as_frame=True).frame
california_housing['MedHouseVal'] = boxcox(california_housing['MedHouseVal'])[0]