I am trying to apply a box-cox transformation to a single column but I am unable to do that. Can somebody help me with this issue?
from sklearn.datasets import fetch_california_housing
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm
from sklearn.preprocessing import PowerTransformer
california_housing = fetch_california_housing(as_frame=True).frame
california_housing
power = PowerTransformer(method='box-cox', standardize=True)
california_housing['MedHouseVal']=power.fit_transform(california_housing['MedHouseVal'])
The function
power.fit_transformrequires the input data in case of a single feature to have shape(n, 1)instead of(n,)(wherecalifornia_housing['MedHouseVal']is of shape(n,), as it is apd.Series). This can be achieved either by reshaping, i.e. by replacingwith
or, alternatively, and a bit more readable, by simply accessing a list of columns (which gives a
pd.DataFrame) withcalifornia_housing[['MedHouseVal']]instead of a single column (which gives apd.Series) withcalifornia_housing['MedHouseVal'], i.e. by usingNote that
prints
An other option would be to use
scipy.stats.boxcox: