Finding where an excerpt starts in an audio file: cross-correlation coefficients between two arrays in Python

Question

Finding where an excerpt starts in an audio file: cross-correlation coefficients between two arrays in Python

45 Views Asked by Paidon At 20 March 2024 at 12:55

I have been pulling my hair on a problem, and all of the answers I found on StackOverflow so far did not help - so I am asking for your help.

The overall problem

I would like to create a function that finds the exact timestamp where an audio excerpt starts in a larger audio file. For test purposes, I used a 5 minutes audio file and a 43 second excerpt of it. Below, I aligned the two audio files in Audacity: the excerpt starts exactly at 00:01:55.554920.

I would also like the function to return a value if, and only if it has a confidence value over a certain threshold, that would actually be a parameter of the function. The way I intend to do it is by checking if the correlation coefficient between the two aligned signals is over the given threshold.

In other words, here is a simplified version of the code:

find_excerpt_starting_sample(original_audio, excerpt, threshold):

    # Find the cross-correlation coefficients for each lag
    xcorr = cross_correlation(original_audio, excerpt)

    # Return the lag of the max correlation if it is over threshold
    if np.max(xcorr) > threshold:
        return np.argmax(xcorr)
    else:
        raise Exception("No correlation over threshold found.")

I have been having a lot of troubles finding the right cross_correlation function, because none of my attempts returned an array that would be between 0 and 1.

The problem, simplified

As my attempts on the audio files have been inconclusive, I have tried to perform the same thing on two numerical arrays:

y1 = [2, 22, 14, 8, 0, 4, 8, 16, 26, 6, 12, 14, 16, 2, 6]
y2 = [4, 8, 16, 26, 6, 12]

Here, y2 contains a subset of y1 (starting at index 5). In order to make sure that the function works independently of the amplitude scale, I halved all of the values of y2:

y1 = [2, 22, 14, 8, 0, 4, 8, 16, 26, 6, 12, 14, 16, 2, 6]
y2 = [2, 4, 8, 13, 3, 6]

I would like to create a cross-correlation function that returns an array where the value at lag 5 is 1.

My attempts so far

np.corrcoef

If we just do a simple correlation and slide the excerpt along the original audio, it works:

import numpy as np
import matplotlib as plt

corr = np.zeros(len(y1) - len(y2))
for i in range(len(y1) - len(y2)):
    corr[i] = np.corrcoef(y1[i:i+len(y2)], y2)[0][1]

print(corr)
plt.plot(corr)
plt.show()

The output is:

[ 0.18961375 -0.71250433 -0.56075283 -0.08468414  0.21913077  1. -0.04179451 -0.46803451 -0.24815461]

The problem is that this technique is really, really not efficient for longer files.

scipy.signal.correlate

Now, instead of reinventing the wheel, I started using one of the main solutions found on Stack Overflow, i.e. the correlate function of scipy.signal. It returns values where the proper lag is found. However, as is, because it performs a convolution, there is no way to quantify the correlation.

from scipy import signal

xcorr = signal.correlate(y1, y2, mode="full")
lags = signal.correlation_lags(len(y1), len(y2), mode="full")

print(xcorr)
print(lags)
plt.plot(lags, xcorr)
plt.show()

The output is:

[ 12 138 176 392 390 332 224 232 356 402 596 486 478 414 422 252 186  88 28  12]
[-5 -4 -3 -2 -1  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]

I saw a few solutions, but they do not work as I intend.

First, a solution here suggests to normalize the coefficient using this function:

corr = signal.correlate(y1 / np.std(y1), y2 / np.std(y2), 'full') / min(len(y1), len(y2))
lags = signal.correlation_lags(len(y1), len(y2), mode="full")
print(c)
plt.plot(lags, c) 
plt.show()

The output is:

[0.0736392  0.84685082 1.08004163 2.40554727 2.39327407 2.03735126 1.37459844 1.42369124 2.18462966 2.46691327 3.65741371 2.98238769 2.93329489 2.54055247 2.58964528 1.54642325 1.14140763 0.54002082 0.17182481 0.0736392 ]

As you can see, the maximum value is not 1, but 3.65741371.

I then tried another solution found here:

y1n = y1 / np.std(y1)
y2n = y2 / np.std(y2)
xcorr = signal.correlate(y1n, y2n, mode="full")
lags = signal.correlation_lags(len(y1), len(y2), mode="full")
print(xcorr)
plt.plot(lags, xcorr)
plt.show()

The output is:

[ 0.44183521  5.08110495  6.48024979 14.43328362 14.35964442 12.22410756  8.24759064  8.54214745 13.10777798 14.80147963 21.94448224 17.89432612 17.59976931 15.24331484 15.53787165  9.27853947  6.8484458   3.24012489  1.03094883  0.44183521]

Once again, the max value for the cross-correlation is not 1 but 21.94448224

A cry for help

There is a lot I do not know about correlation - I dug into it, but before going deeper I'm asking, if one of you happened to be able to point me in the right direction, and what I did wrong so far.

Thanks a lot!

Original Q&A

There are 2 best solutions below

**Onyambu** · Answer 1 · 2024-03-20T16:50:14.673000

You can use convolution for this:

def cross_corr(x, y):
  x = np.array(x)
  y = np.array(y[::-1])
  yi = (y - y.mean())/ y.std() / np.sqrt(y.size)
  x_m = np.convolve(x, np.ones(yi.size), 'valid')**2/yi.size
  x_m2 = np.convolve(x**2, np.ones(yi.size), 'valid')
  return np.convolve(x, yi, 'valid')/np.sqrt(x_m2 - x_m)

cross_corr(y1,y2)
array([ 0.18961375, -0.71250433, -0.56075283, -0.08468414,  0.21913077,
        1.        , -0.04179451, -0.46803451, -0.24815461,  0.77716484])

This function is a lot of magnitudes faster to the original solution

**Paidon** · Answer 2 · 2024-03-20T18:45:46.140000

I adapted the code from @Onyambu (thanks again for your help!) in order to replace the convolutions by signal.correlate(). Here is what I got:

import numpy as np

def cross_correlation(y1, y2):
    y2_normalized = (y2 - y2.mean()) / y2.std() / np.sqrt(y2.size)
    y1_m = signal.correlate(y1, np.ones(y2.size), 'valid') ** 2 / y2_normalized.size
    y1_m2 = signal.correlate(y1 ** 2, np.ones(y2.size), "valid")
    cross_correlation = signal.correlate(y1, y2_normalized, "valid") / np.sqrt(y1_m2 - y1_m)

The code is much faster when cross-correlating audio files; the execution of the code took less than 2 seconds for the audio clips I mentioned at the beginning of my problem (that are 5 minutes and 43 seconds, respectively, sampled at 44100 Hz).

Here is the output:

The peak has a value of 1.0, and is precise to the millisecond.

Note: before performing the cross-correlation, I got the envelope of both of the audio files using y_env = np.abs(scipy.signal.hilbert(y)) and a low-pass filter at 50 Hz using b, a = scipy.signal.butter(2, filter_over, "low", fs=44100) and y_filt = lfilter(b, a, y_env). You can also perform a downsampling at any point in case you have very long data.

Finding where an excerpt starts in an audio file: cross-correlation coefficients between two arrays in Python

The overall problem

The problem, simplified

My attempts so far

np.corrcoef

scipy.signal.correlate

A cry for help

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in SCIPY

Related Questions in CORRELATION

Related Questions in CROSS-CORRELATION

Trending Questions

Popular # Hahtags

Popular Questions