I am trying to capture audio data from microphone using sounddevice module's rec() function, storing as float32 and feeding it to whisper's. But I don't want to save the audio as a file and recall it.
Is there a way to feed transcribe() function a float32 array
here is my attempt to do so
i tried to convert data using np.array() but failed miserably.
import sounddevice as sd
import numpy as np
import whisper
duration = 15
samplerate = 44100
frames = duration * samplerate
recording = sd.rec(frames, blocking=True, dtype='float32')
model = whisper.load_model("tiny")
rec_array = np.array(recording,dtype=np.float32)
result = model.transcribe(recording,word_timestamps=True, fp16=False)
text = result["text"].strip()
print(text)
the error
RuntimeError: [enforce fail at alloc_cpu.cpp:80] data. DefaultCPUAllocator: not enough memory: you tried to allocate 846721764000 bytes.
that is 847 gigabytes. I truly broke something
edit: full traceback error
Traceback (most recent call last):
File "d:\Seshrut\Error-505!!\projects\learn whisper\soundbreak.py", line 34, in <module>
result = model.transcribe(recording,word_timestamps=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\seshr\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\whisper\transcribe.py", line 133, in transcribe
mel = log_mel_spectrogram(audio, model.dims.n_mels, padding=N_SAMPLES)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\seshr\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\whisper\audio.py", line 146, in log_mel_spectrogram
audio = F.pad(audio, (0, padding))
^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: [enforce fail at alloc_cpu.cpp:80] data. DefaultCPUAllocator: not enough memory: you tried to allocate 846721764000 bytes.