In my app,
I'm getting array of audio sample (with sample rate =8000) which was loaded with
torchaudio.loadI need to use this audio array and run whisper (STT).
I want to avoid from loading the wav file again with whisper (load_audio) (for efficiency) and to resample the array to 16000.
whisper.load_audiouseffmpegto load and resample the audio to 16000. I'm trying to uselibrosaortorchaudioand resample the audio array but It always seems that the resample methods are not the same.(I assume that if I use other resample method not as the whisper model was trained on, I can get bad results).
Example:
loading test.wav file (with SR=8000) and print the 5 first cells:
whisper_audio = whisper.load_audio(file) => [-0.00082397 -0.00115967 -0.00186157 -0.00231934 -0.00222778, ...]
loading with torchaudio and resample it with librosa:
librosa.resample(vad_audio, orig_sr=8000, target_sr=16000, scale=True, res_type='kaiser_best')
=> [-0.00082317 -0.0010577 -0.0013937 -0.0016688 -0.00186235
seems different values.
How can I resample the audio in the exact way ffmpeg do it ?
You can use
torchaudio.io.StreamReaderto load and resample audio. This functionality is implemented withffmpeg, so you might be able to produce the same waveform.When you use the
add_basic_audio_streammethod withsample_rateoption, it will use FFmpeg's filter function to apply resampling.https://pytorch.org/audio/2.1.1/generated/torchaudio.io.StreamReader.html#add-basic-audio-stream
If the
ffmpegcommand is using non-default re-sampling method, you need to construct the same filter description and pass it toadd_audio_streammethod.https://pytorch.org/audio/2.1.1/generated/torchaudio.io.StreamReader.html#add-audio-stream