Difference in values between numpy.correlate and numpy.corrcoef?

Question

Difference in values between numpy.correlate and numpy.corrcoef?

133 Views Asked by user2183336 At 28 August 2023 at 12:06

It was my understanding that numpy.correlate and numpy.corrcoef should yield the same result for aligned normalized vectors. Two immediate cases to the contrary:

from math import isclose as near
import numpy as np


def normalizedCrossCorrelation(a, b):
    assert len(a) == len(b)
    normalized_a = [aa / np.linalg.norm(a) for aa in a]
    normalized_b = [bb / np.linalg.norm(b) for bb in b]
    return np.correlate(normalized_a, normalized_b)[0]


def test_normalizedCrossCorrelationOfSimilarVectorsRegression0():
    v0 = [1, 2, 3, 2, 1, 0, -2, -1, 0]
    v1 = [1, 1.9, 2.8, 2, 1.1, 0, -2.2, -0.9, 0.2]
    assert near(normalizedCrossCorrelation(v0, v1), 0.9969260391224474)
    print(f"{np.corrcoef(v0, v1)=}")
    assert near(normalizedCrossCorrelation(v0, v1), np.corrcoef(v0, v1)[0, 1])


def test_normalizedCrossCorrelationOfSimilarVectorsRegression1():
    v0 = [1, 2, 3, 2, 1, 0, -2, -1, 0]
    v1 = [0.8, 1.9, 2.5, 2.1, 1.2, -0.3, -2.4, -1.4, 0.4]
    assert near(normalizedCrossCorrelation(v0, v1), 0.9809817769512982)
    print(f"{np.corrcoef(v0, v1)=}")
    assert near(normalizedCrossCorrelation(v0, v1), np.corrcoef(v0, v1)[0, 1])

Pytest output:

E       assert False
E        +  where False = near(0.9969260391224474, 0.9963146417122921)
E        +    where 0.9969260391224474 = normalizedCrossCorrelation([1, 2, 3, 2, 1, 0, ...], [1, 1.9, 2.8, 2, 1.1, 0, ...])


E       assert False
E        +  where False = near(0.9809817769512982, 0.9826738919606931)
E        +    where 0.9809817769512982 = normalizedCrossCorrelation([1, 2, 3, 2, 1, 0, ...], [0.8, 1.9, 2.5, 2.1, 1.2, -0.3, ...])

Original Q&A

There are 1 best solutions below

**Ruggero Turra** · Answer 1 · 2023-08-28T13:27:01.050000

I think your formula with np.correlate is wrong, it does not yield the correlation coefficient.

Consider the first example

v0 = [1, 2, 3, 2, 1, 0, -2, -1, 0]
v1 = [1, 1.9, 2.8, 2, 1.1, 0, -2.2, -0.9, 0.2]


np.correlate(v0 / np.linalg.norm(v0), v1 / np.linalg.norm(v1))[0] # 0.9969260391224474
# you can also use
#    np.correlate(v0 , v1 , mode='valid') / np.linalg.norm(v0) / np.linalg.norm(v1)
# but you get same number
np.corrcoef(v0, v1)[0][1]                                         # 0.9963146417122921

The correct answer, computed without using floating point should be 59 Sqrt[5/17534] which approximates to 0.99631464171229218403, which is surprasingly identical to np.corrcoef.

Take into account that

np.correlate(a, b)

when a and b are 1d array of the same size, returns the scalar product (e.g. np.dot(a, b)). The covariance can be computed (even if it is not recomended) as E[v0 v1] - E[v0]E[v1]. This can be done as

(np.correlate(v0 , v1 , mode='valid') / len(v0) - np.mean(v0) * np.mean(v1))[0]

this is equal to np.cov(v0, v1, ddof=0)[0][1]. So you can compute the correlation as

((np.correlate(v0 , v1 , mode='valid') / len(v0) - np.mean(v0) * np.mean(v1)) / np.std(v0) / np.std(v1))[0]

By the way, just use np.corrcoef or np.cov.

Math explaination

Your formula using np.correlate is equivalent to:

E[v0 * v1] / sqrt( E[v0 ** 2] E[v1 ** 2] )

where E is the sample mean. But the correlation coefficient can be computed as

(E[v0 * v1] - (E[v0] * E[v1])) / sqrt( (E[v0 ** 2] - E[v0] ** 2) *  (E[v1 ** 2] - E[v1] ** 2 )

Difference in values between numpy.correlate and numpy.corrcoef?

There are 1 best solutions below

Math explaination

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in CORRELATION

Related Questions in CROSS-CORRELATION

Trending Questions

Popular # Hahtags

Popular Questions