How to compute weights used in computing the principal components in PCA

47 Views Asked by At

I apply PCA to time series of the shape TxN I want recompute the first PC using the loadings and compare it to the original PC. So far what I have tried this

import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

# Sample input data (replace this with \ actual data)
data = pd.DataFrame(np.random.rand(100, 20), columns=[f'Feature_{i}' for i in range(1, 21)])

# Standardize the data
scaler = StandardScaler()
data_standardized = scaler.fit_transform(data)

# Perform PCA with 10 components
pca = PCA(n_components=10)
pca.fit(data_standardized)

# Get loadings for the first principal component
loadings_first_component = pca.components_[0]

# Extract the square root of the explained variance for the first component
explained_variance_sqrt = np.sqrt(pca.explained_variance_[0])

# Scale the loadings by the square root of explained variance
loadings_first_component_scaled = loadings_first_component * explained_variance_sqrt

# Extract the first principal component
first_pc_original = pca.transform(data_standardized)[:, 0]

# Recompute the first principal component using loadings and input data
first_pc_recomputed = np.dot(data_standardized, loadings_first_component_scaled)

# Check if the recomputed first principal component is equal to the original
is_equal = np.allclose(first_pc_original, first_pc_recomputed)

print("Original First Principal Component:")
print(first_pc_original)
print("\nRecomputed First Principal Component:")
print(first_pc_recomputed)
print("\nAre they equal?", is_equal)

But the original and recomputed PC is not the same ? The end goal of what I am doing is to find the weights used in the linear combination to compute the first PC. Initially I thought that pca.components_[0] are the weights, but I think this is wrong

1

There are 1 best solutions below

0
nicrie On

Short answer

You incorrectly project your standardized data onto the scaled loadings, when actually you want to project the data onto the unscaled loadings.

Replace

first_pc_recomputed = np.dot(data_standardized, loadings_first_component_scaled)

with

first_pc_recomputed = np.dot(data_standardized, loadings_first_component)

Explanation

PCA decomposes your data (X) into principal component axis (A) and principal component scores (S):

X = SA.T

where A is an orthonormal matrix and .T denotes its transpose. What you're interested in is:

S = XA

In sklearn the results are as follows:

  • pca.components_ -> A
  • pca.transform() -> S

So in order to get S, just project X onto A using xr.dot(X, A).