Mel Spectrogram Implementation in Python

109 Views Asked by At

First of all, I am a beginner and I'm trying to replicate the process of obtaining Mel Spectrogram from an audio file. For the first step, I want to try windowing my signal using Hanning or Hamming window with 25 ms window length and 10 ms window step and then do Fourier Transform to each window. I'm quite lost in this step, so I'm asking for your help.

Here's the code that I tried

def stft(signal, windowSize, windowStep):

    # Frame number estimation from a signal
    n_frames = 1 + int((len(signal)-windowSize)/windowStep)

    # Initialize empty matrix
    stft_matrix = np.zeros((n_frames, int(windowSize/2)+1),dtype=np.float32)

    # Count STFT loop, keep only the nyquist freqs
    for i in range(n_frames):
        start = i * windowStep
        end = start + windowSize
        frame = signal[start:end]*np.hanning(windowSize)
        frame_fft = np.fft.fft(frame)[:int(windowSize/2)+1]
        stft_matrix[i, :] = np.abs(frame_fft)
    return stft_matrix    

wav_name = 'Q4.wav'
x, sr = librosa.load(wav_name, sr=None)

window_size = 0.025  # 25 ms
window_step = 0.010  # 10 ms
stft_matrix = stft(x, int(window_size * sr), int(window_step * sr))

# Plot the magnitude spectrum of the desired frame
plt.plot(stft_matrix[0, :])
plt.xlabel('Frequency (Hz)')
plt.ylabel('Magnitude')
plt.show()

stft_matrix[10,:].shape

I was expecting if I change the window size, the magnitude of the frequency should stay the same, but when I change it the magnitude changes, the frequency range changes, but the plot kind of look the same. I'm sorry if I'm asking an idiotic question, but am I doing this wrong or not? Please help me.

This is when I set the window_size to int(window_size * sr))

This is when I randomly set the window_size to 2048

Once again I'm sorry if this seems idiotic, but I'd appreciate your help. Thanks!

0

There are 0 best solutions below