I'm going to identify all chains of items that fulfill a few conditions:
- The correlation coefficients between every item in a chain should be positive and significant (p value < corrected α, let's say corrected α = 0.01).
- The correlation coefficients of every pair of items in the chain should decrease with the numbers of items inbetween both items, i.e. if we're looking at the i-th and (i+k)-th item (with k being the distance between the items in the chain, either to the right or left side), the correlation coefficient should be smaller than between the i-th and (i+(k-1))-th item and greater than between the i-th and (i+(k+1))-th item.
- An item can occur at any position inside the chain (regardless of the order in the original data set) and should only occur once inside the chain.
- I'm only interested in the longest chains, i.e. chains that are a part of another, longer chain (maybe with more nodes in-between the items) should be removed.
My first thought to identify such "correlation chains" was to test all possible permutations of lengths from 3 up to n (number of items in the dataset). However, I doubt that this exhaustive search will be the most efficient way to identify correlation chains. Maybe building up possible chains from scratch might be a better way. Nevertheless, I'm still a bit lost on how I can do it in an efficient way in R. Thus, I'd be very honored if you could suggest a way!
Here's some example data set we could use:
require(Hmisc)
z <- rcorr(as.matrix(mtcars))
z$r # correlation coefficients
z$P # correlation tests' p valuess
I use the function rcorr from the Hmisc package to calculate matrices of correlation coefficients and p values. Many thanks in advance for your suggestions!!!