I am using the whisper API in nodejs / express to generate transcripts (in srt format) - but because the audio length is so long I have to chunk it. The problem is, when the transcript is reassembled, the timestamps are off.
So what I get between the reassembled transcript is this:
150
00:09:58,780 --> 00:10:00,060
It felt a bit like being saved.
1
00:00:00,000 --> 00:00:05,160
set up on a playdate, but I stopped by nonetheless and that's when I met Suzanne.
and it should be:
150
00:09:58,780 --> 00:10:00,060
It felt a bit like being saved.
151
00:00:00,000 --> 00:00:05,160
set up on a playdate, but I stopped by nonetheless and that's when I met Suzanne.
Note that in the first example, the double empty lines are correct - when I join the bits of transcript (transcription = chunkedTranscripts.join("");), this happens, which is making it hard to just run through the file and do editing on each entry.
How can I account for chunking and create correct transcripts? The whisper API doesn't provide a method for this even though it limits transcription (to 25mb)