Implementing covariance by hand

54 Views Asked by At

I made a curious observation today - I implemented the covariance of two vectors 3 different ways, and got 2 different answers. In R, methods 1 and 2 are the same. Method 3 ought to be mathematically the same, but somehow it returns a different number. What is the bug here?

# Make some data
set.seed(100)
n = 100
foo = rnorm(n)
bar = rnorm(n)

# method 1
cov(foo, bar)

# method 2
sum(
  (foo - mean(foo)) * (bar - mean(bar))
) / (n-1)

# method 3
(
  sum(foo*bar) - 
    mean(bar)*sum(foo) -
    mean(foo)*sum(bar) + 
    mean(foo)*mean(bar)
) / (n-1)
1

There are 1 best solutions below

0
Ben Bolker On BEST ANSWER

You forgot that the mathematical expression sum_{i=1}^n mean(foo)*mean(bar) comes out to n*mean(foo)*mean(bar), not mean(foo)*mean(bar) ...

(sum(foo*bar) - 
    mean(bar)*sum(foo) -
    mean(foo)*sum(bar) + n*mean(foo)*mean(bar))/(n-1)