Retaining the target class during PCA in the auto dataset

86 Views Asked by David At 16 February 2023 at 15:04

I am trying to find the correct way, or to make sure that I have retained the target class during a PCA. I tried to do the scaling before and after splitting the data, but the issue is still the same.

I am sorry that I can't use the seaborn.load_dataset(name, cache=True, data_home=None, **kws) to load the dataset so here we go

Loading the data

# loading the dataframe
auto = pd.read_csv('auto.csv')

Make a target class by saying that any mileage lower than the median is 0 and higher is 1

med=np.median(auto["mpg"])
auto["mpg01"]=auto["mpg"].apply(lambda x: 1 if x>med else 0)

Splitting the data

X=auto[['cylinders','displacement','horsepower','weight','acceleration','year',"origin"]]
y=auto["mpg01"]
X_train, X_test, y_train, y_test = train_test_split(X,y , random_state=101,  test_size=0.3, shuffle=True)

Start the PCA

pca2 = PCA(n_components=2)
X_train_reduced2 = pca2.fit_transform(scale(X_train))

Make a DF that joins the pcs and the target class

pca_df2 = pd.DataFrame(X_train_reduced2, columns =["PC1", "PC2"])
pca_df2["mpg01"]=y_train
pca_df2

I noticed that there are some NANs in this new dataframe. The length of the dataframe makes senses. The only thing I can think of is that the index no longer matches, but it should, and I have no way to verify it. enter image description here

The 2D plot of the PCA shows this. There is no separations btw the target class. I am just wondering if I got all the step right.

enter image description here

Original Q&A

There are 1 best solutions below

Deusy94 On 16 February 2023 at 15:53

As you said, indexes are no longer matching. You need to modify the line:

pca_df2 = pd.DataFrame(X_train_reduced2, columns=["PC1", "PC2"], index=X_train.index)

Note that PCA is not returning a pd.Dataframe, but a simple np.array. You need to fix indexed to match the label y_train.

Retaining the target class during PCA in the auto dataset

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in SEABORN

Related Questions in PCA

Related Questions in TARGET

Related Questions in RETAIN

Trending Questions

Popular # Hahtags

Popular Questions