It was my understanding that numpy.correlate and numpy.corrcoef should yield the same result for aligned normalized vectors. Two immediate cases to the contrary:
from math import isclose as near
import numpy as np
def normalizedCrossCorrelation(a, b):
assert len(a) == len(b)
normalized_a = [aa / np.linalg.norm(a) for aa in a]
normalized_b = [bb / np.linalg.norm(b) for bb in b]
return np.correlate(normalized_a, normalized_b)[0]
def test_normalizedCrossCorrelationOfSimilarVectorsRegression0():
v0 = [1, 2, 3, 2, 1, 0, -2, -1, 0]
v1 = [1, 1.9, 2.8, 2, 1.1, 0, -2.2, -0.9, 0.2]
assert near(normalizedCrossCorrelation(v0, v1), 0.9969260391224474)
print(f"{np.corrcoef(v0, v1)=}")
assert near(normalizedCrossCorrelation(v0, v1), np.corrcoef(v0, v1)[0, 1])
def test_normalizedCrossCorrelationOfSimilarVectorsRegression1():
v0 = [1, 2, 3, 2, 1, 0, -2, -1, 0]
v1 = [0.8, 1.9, 2.5, 2.1, 1.2, -0.3, -2.4, -1.4, 0.4]
assert near(normalizedCrossCorrelation(v0, v1), 0.9809817769512982)
print(f"{np.corrcoef(v0, v1)=}")
assert near(normalizedCrossCorrelation(v0, v1), np.corrcoef(v0, v1)[0, 1])
Pytest output:
E assert False
E + where False = near(0.9969260391224474, 0.9963146417122921)
E + where 0.9969260391224474 = normalizedCrossCorrelation([1, 2, 3, 2, 1, 0, ...], [1, 1.9, 2.8, 2, 1.1, 0, ...])
E assert False
E + where False = near(0.9809817769512982, 0.9826738919606931)
E + where 0.9809817769512982 = normalizedCrossCorrelation([1, 2, 3, 2, 1, 0, ...], [0.8, 1.9, 2.5, 2.1, 1.2, -0.3, ...])
I think your formula with
np.correlateis wrong, it does not yield the correlation coefficient.Consider the first example
The correct answer, computed without using floating point should be 59 Sqrt[5/17534] which approximates to
0.99631464171229218403, which is surprasingly identical tonp.corrcoef.Take into account that
when
aandbare 1d array of the same size, returns the scalar product (e.g.np.dot(a, b)). The covariance can be computed (even if it is not recomended) asE[v0 v1] - E[v0]E[v1]. This can be done asthis is equal to
np.cov(v0, v1, ddof=0)[0][1]. So you can compute the correlation asBy the way, just use
np.corrcoefornp.cov.Math explaination
Your formula using
np.correlateis equivalent to:where
Eis the sample mean. But the correlation coefficient can be computed as