Eliminate redundant data from Pearson correlation to R

55 Views Asked by At

Having made a correlation of Pearson on R, I would like to simplify my data set by selecting the indicators with a correlation, between two, greater than 0.7 and less than -0.7 and by eliminating redundant indicators. enter image description here

thank you

I tried this code but I’m not convinced...

# Calculate the correlation matrix with the Pearson method
cor_matrix <- cor(tab, method = "pearson")

# Browse the correlation matrix
for (i in 1:(ncol(cor_matrix) - 1)) {
  for (j in (i + 1):ncol(cor_matrix)) {
    # Check if the correlation is greater than 0.7 and less than -0.7
    if (cor_matrix[i, j] > 0.7 || cor_matrix[i, j] < -0.7) {
      # Compare absolute correlations
      if (abs(cor_matrix[i, j]) > abs(cor_matrix[j, i])) {
        # Eliminate variable j
        tab <- tab[, -j]
      } else {
        # Eliminate variable i
        tab <- tab[, -i]
      }
    }
  }
}

# View the simplified table
print(tab)
1

There are 1 best solutions below

0
zx8754 On

Check if any absolute value is greater than 0.7, then get columnwise sum, to get index of columns to keep, then subset:

cc <- names(cor_matrix)[ colSums(abs(cor_matrix) > 0.7) == 0 ]
cor_matrix[ cc, cc ]