I want to convert ogg byte array/bytes with Opus codec to wav byte array/bytes without saving to disk. I have downloaded audio from telegram api and it is in byte array format with .ogg extension. I do not want to save it to filesystem to eliminate filesystem io latencey.
Currently what I am doing is after saving the audio file in .ogg format using code the below code using telegram api for reference https://docs.python-telegram-bot.org/en/stable/telegram.file.html#telegram.File.download_to_drive
# listen for audio messages
async def audio(update, context):
newFile = await context.bot.get_file(update.message.voice.file_id)
await newFile.download_to_drive(output_path)
I am using the code
subprocess.call(["ffmpeg", "-i", output_path, output_path.replace(".ogg", ".wav"), '-y'], stderr=subprocess.DEVNULL, stdout=subprocess.DEVNULL)
to convert ogg file to wav file. But this is not what I want.
I want the code
async def audio(update, context):
newFile = await context.bot.get_file(update.message.voice.file_id)
byte_array = await newFile.download_as_bytearray()
to get byte_array and now I want this byte_array to be converted to wav without saving to disk and without using ffmpeg. Let me know in comments if something is unclear. Thanks!
Note: I have setted up a telegram bot at the backend which listens for audios sent to private chat which I do manually for testing purposes.
We may write the OGG data to FFmpeg
stdinpipe, and read the encoded WAV data from FFmpegstdoutpipe.My following answer describes how to do it with video, and we may apply the same solution to audio.
The example assumes that the OGG data is already downloaded and stored in bytes array (in the RAM).
Piping architecture:
The implementation is equivalent to the following shell command:
Linux:
cat input.ogg | ffmpeg -y -f ogg -i pipe: -f wav pipe: > test.wavWindows:
type input.ogg | ffmpeg -y -f ogg -i pipe: -f wav pipe: > test.wavThe example uses ffmpeg-python module, but it's just a binding to FFmpeg sub-process (FFmpeg CLI must be installed, and must be in the execution path).
Execute FFmpeg sub-process with
stdinpipe as input andstdoutpipe as output:The input format is set to
ogg, the output format is set towav(use default encoding parameters).Assuming the audio file is relatively large, we can't write the entire OGG data at once, because doing so (without "draining"
stdoutpipe) causes the program execution to halt.We may have to write the OGG data (in chunks) in a separate thread, and read the encoded data in the main thread.
Here is a sample for the "writer" thread:
The "writer thread" writes the OGG data in small chucks.
The last chunk is smaller (assume the length is not a multiple of chuck size).
At the end,
stdinpipe is closed.Closing
stdinfinish encoding the data, and closes FFmpeg sub-process.In the main thread, we are starting the thread, and read encoded "WAV" data from
stdoutpipe (in chunks):For reading the remaining data, we may use
ffmpeg_process.communicate():Complete code sample: