I came across this formula in a text that says $S$ is the sample covariance matrix where
$$S = \sum_{j=1}^n(\mathbf{X}_j - \bar{\mathbf{X}})(\mathbf{X}_j-\bar{\mathbf{X}})'$$, or from the source:
What I am trying to figutre out is how to calculate that equation in R. For example, if I had the following:
x <- c(1, 3, 5, 2)
y <- c(2, 3, 8, 7)
z <- c(22, 1, 3, 3)
X <- cbind(x, y, z)
I assume I can just use the cov() function and get
> cov(X)
x y z
x 2.916667 3.333333 -11.25000
y 3.333333 8.666667 -17.66667
z -11.250000 -17.666667 97.58333
I also saw this calculation based on the above formula:
xbar <- apply(X, 2, mean)
d <- as.matrix(t(t(X) - xbar))
s2 <- matrix(0, 3, 3)
for (i in 1:3) {
s2 <- s2 + (d[i, ]) %*% t(d[i, ])
}
> s2
x y z
[1,] 8.1875 11.5 -36.9375
[2,] 11.5000 22.0 -44.5000
[3,] -36.9375 -44.5 274.6875
but as you can see, the two do not return the same sample covariance matrix. I am having a hard time figuring out which is the correct way to calculate that equation, or if neither is correct.
