I apply PCA to time series of the shape TxN I want recompute the first PC using the loadings and compare it to the original PC. So far what I have tried this
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# Sample input data (replace this with \ actual data)
data = pd.DataFrame(np.random.rand(100, 20), columns=[f'Feature_{i}' for i in range(1, 21)])
# Standardize the data
scaler = StandardScaler()
data_standardized = scaler.fit_transform(data)
# Perform PCA with 10 components
pca = PCA(n_components=10)
pca.fit(data_standardized)
# Get loadings for the first principal component
loadings_first_component = pca.components_[0]
# Extract the square root of the explained variance for the first component
explained_variance_sqrt = np.sqrt(pca.explained_variance_[0])
# Scale the loadings by the square root of explained variance
loadings_first_component_scaled = loadings_first_component * explained_variance_sqrt
# Extract the first principal component
first_pc_original = pca.transform(data_standardized)[:, 0]
# Recompute the first principal component using loadings and input data
first_pc_recomputed = np.dot(data_standardized, loadings_first_component_scaled)
# Check if the recomputed first principal component is equal to the original
is_equal = np.allclose(first_pc_original, first_pc_recomputed)
print("Original First Principal Component:")
print(first_pc_original)
print("\nRecomputed First Principal Component:")
print(first_pc_recomputed)
print("\nAre they equal?", is_equal)
But the original and recomputed PC is not the same ? The end goal of what I am doing is to find the weights used in the linear combination to compute the first PC. Initially I thought that pca.components_[0] are the weights, but I think this is wrong
Short answer
You incorrectly project your standardized data onto the scaled loadings, when actually you want to project the data onto the unscaled loadings.
Replace
with
Explanation
PCA decomposes your data (X) into principal component axis (A) and principal component scores (S):
X = SA.T
where A is an orthonormal matrix and .T denotes its transpose. What you're interested in is:
S = XA
In
sklearnthe results are as follows:pca.components_-> Apca.transform()-> SSo in order to get S, just project X onto A using
xr.dot(X, A).