Python and audiofile words detection: timestamps for recognized words is off by 2-3 seconds

425 Views Asked by At

I'm trying to use PocketSphynx to find words inside a wav file. It's actually a real challenge, since the documentation is really poor (null sometimes).

import os
from pocketsphinx import AudioFile
from pocketsphinx import Pocketsphinx, LiveSpeech, get_model_path, get_data_path
import speech_recognition as sr


# Frames per Second
fps = 100

r = sr.Recognizer()
framerate = 100
with sr.AudioFile("audiotestcorto.wav") as source:

    audio = r.record(source)

    decoder = r.recognize_sphinx(audio, language = "en-US", show_all=True)
    
    for s in decoder.seg():
        print('| %4s s | %4s s | %8s |' % (s.start_frame ,  s.end_frame , s.word))

this is my code, and it doesn't throw any error (already an achievement, after some dark days :) )

The problem is that the timestamps of the words are wrong.

why timestamps are wrong? the error is around 2 - 3 seconds, and it doesn't get better even if I multiply or divide by the framerate.

0

There are 0 best solutions below