I'm working on a Telegram bot that can receive voice messages and then let OpenAI's Whisper transcribe them and then respond using OpenAI's chat completions API.
Anyway, Whisper does accept a webm file as an input, but not an ogg file. Even though ironically, from what I've read, a webm container can contain a pure ogg file as its soundtrack.
I can't use ffmpeg, because I'm deploying this as a serverless function (on Vercel for now) and I have no guarantee that ffmpeg will be installed there. But I was thinking, since webm is simply a container file which can contain the raw ogg opus codec as the soundtrack, wouldn't it be possible to just take the binary audio data that I can get from Telegram, using const audioData = await response.arrayBuffer(), and just add some bytes to the beginning and end of it that represent the webm container?
If yes, then can someone please tell me which bytes I'd need to add exactly?