Applying Discrete Cosine Transform to Mel Spectrogram to Obtain MFCC

176 Views Asked by At

So I'm trying to replicate the process of obtaining MFCC from an audio file. So far I have obtained the Mel Spectrogram, and the last step is to perform Discrete Cosine Transform to the Mel Spectrogram. I've tried using scipy's dct() function to the spectrogram but it's still not quite what I'm looking for. I cross checked with Librosa's MFCC function too and it's still different. Please help, and thank you in advance!

Here are the codes that I used to generate the Mel Spectrogram

# Function to perform STFT on each window
def stft(signal, windowSize, windowStep):

    # Frame number estimation
    n_frames = 1 + int((len(signal)-windowSize)/windowStep)

    # Initialize empty matrix, to store STFT result
    stft_matrix = np.zeros((n_frames, int(windowSize/2)+1),dtype=np.float32)

    # Loop to perform STFT, keep only the nyquist freqs
    for i in range(n_frames):
        start = i * windowStep
        end = start + windowSize
        frame = signal[start:end]*np.hanning(windowSize)
        frame_fft = np.fft.fft(frame)[:int(windowSize/2)+1]
        stft_matrix[i, :] = np.abs(frame_fft)
    return stft_matrix  

# Input signal
wav_name = '0015_000009_neutral.wav'
x, sr = librosa.load(wav_name, sr=None) # sr = none

# Initialize window step and length
window_size = 0.025  # 25 ms
window_step = 0.010  # 10 ms
stft_matrix = stft(x, int(window_size * sr), int(window_step * sr))

# Plot vanilla spectrogram

# Transpose
stftTranspose = stft_matrix.transpose()

# Convert STFT to dB-scaled spectrogram
spectrogram = librosa.amplitude_to_db(stftTranspose, ref=np.max)

# Set up x-axis and y-axis parameters
time_axis = np.arange(spectrogram.shape[1])
freq_axis = np.arange(spectrogram.shape[0])

# Plot the spectrogram
librosa.display.specshow(spectrogram, x_axis='time', y_axis='linear', sr=sr, hop_length=int(window_step * sr))

# Add colorbar and labels
plt.colorbar(format='%+2.0f dB')
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')

# Constructing Mel Filterbank
frameSize = int(window_size*sr)
hopLength = int(window_step*sr)

melFilters = librosa.filters.mel(n_fft=frameSize, sr=sr, n_mels=128)
melFilters.shape

melFilters /= np.max(melFilters, axis=-1)[:, None] # Librosa uses Slaney, normalized triangular filter, this turns the filter into regular triangular filterbank
plt.plot(melFilters.T)

# Matrix multiplication between Mel Filterbank and Spectrogram
melSpec = np.dot(melFilters, stftTranspose**2 )
melSpec.shape

# Log
logMel = librosa.amplitude_to_db(S=melSpec, ref=np.max)
logMel.shape

# Plotting the mel spectrogram
plt.figure(figsize=(25, 10))
librosa.display.specshow(logMel, sr=sr, hop_length=hopLength, x_axis='time', y_axis='mel', fmax=sr/2)
plt.colorbar(format='%+2.f dB')
plt.title('Mel spectrogram')

# Trying to apply DCT to the Mel Spectrogram
mfcc = fft.dct(logMel)
mfcc.shape

plt.figure(figsize=(25, 10))
librosa.display.specshow(mfcc, sr=sr, hop_length=hopLength, x_axis='time', y_axis='mel', fmax=sr/2)
plt.colorbar(format='%+2.f dB')
plt.title('MFCC')

The plotted MFCC isn't the same as Librosa's MFCC plot, what should I do to apply the DCT to the mel spectrogram? Here are the MFCC plot comparison:

MFCC plot from my original code

MFCC plot using Librosa's MFCC function

0

There are 0 best solutions below