Transcript time offset for chunked audio in whisper api?

211 Views Asked by At

I am using the whisper API in nodejs / express to generate transcripts (in srt format) - but because the audio length is so long I have to chunk it. The problem is, when the transcript is reassembled, the timestamps are off.

So what I get between the reassembled transcript is this:

150
00:09:58,780 --> 00:10:00,060
It felt a bit like being saved.


1
00:00:00,000 --> 00:00:05,160
set up on a playdate, but I stopped by nonetheless and that's when I met Suzanne.

and it should be:

150
00:09:58,780 --> 00:10:00,060
It felt a bit like being saved.

151
00:00:00,000 --> 00:00:05,160
set up on a playdate, but I stopped by nonetheless and that's when I met Suzanne.

Note that in the first example, the double empty lines are correct - when I join the bits of transcript (transcription = chunkedTranscripts.join("");), this happens, which is making it hard to just run through the file and do editing on each entry.

How can I account for chunking and create correct transcripts? The whisper API doesn't provide a method for this even though it limits transcription (to 25mb)

0

There are 0 best solutions below