Having made a correlation of Pearson on R, I would like to simplify my data set by selecting the indicators with a correlation, between two, greater than 0.7 and less than -0.7 and by eliminating redundant indicators. enter image description here
thank you
I tried this code but I’m not convinced...
# Calculate the correlation matrix with the Pearson method
cor_matrix <- cor(tab, method = "pearson")
# Browse the correlation matrix
for (i in 1:(ncol(cor_matrix) - 1)) {
for (j in (i + 1):ncol(cor_matrix)) {
# Check if the correlation is greater than 0.7 and less than -0.7
if (cor_matrix[i, j] > 0.7 || cor_matrix[i, j] < -0.7) {
# Compare absolute correlations
if (abs(cor_matrix[i, j]) > abs(cor_matrix[j, i])) {
# Eliminate variable j
tab <- tab[, -j]
} else {
# Eliminate variable i
tab <- tab[, -i]
}
}
}
}
# View the simplified table
print(tab)
Check if any absolute value is greater than 0.7, then get columnwise sum, to get index of columns to keep, then subset: