PocketSphinx - getHypstr() returns empty for KeyphraseSearch after processRaw

218 Views Asked by At

Trying edu.cmu.sphinx.pocketsphinx with processRaw to detect a keyword.

I have setup the SpeechRecognizer's decoder directly with getDecoder().setKeyphrase(KWS_SEARCH,KEYPHRASE); First calling decoder.startUtt(); then few calls to processRaw with slices of a PCM buffer then calling decoder.endUtt(). I get an empty hypothesis for decoder.hyp();

Have tried few different values for setKeywordThreshold. No luck.

  1. What's missing?

  2. To my understanding When calling processRaw the last parameter full_utt should be False until calling endUtt, is that correct?

  3. When should full_utt be set to True? and how does it affect the recognition?

Edited: Need to mention that at first I am trying to detect "oh mighty computer" phrase which is exactly the demo phrase which is already recognized correctly using the SpeechRecognizer's own RecognizerThread yet when using processRaw got no detection. The audio conditions are the same for both attempts.

Thanks.

1

There are 1 best solutions below

0
Nikolay Shmyrev On

What's missing?

You missed the tutorial recommendation:

The threshold must be tuned to balance between false alarms and missed detections. The best way to do this is to use a prerecorded audio file. The common tuning process is the following:

Take a long recording with few occurrences of your keywords and some other sounds. You can take a movie sound or something else. The length of the audio should be approximately 1 hour. Run a keyword spotting on that file with different thresholds for every keyword, use the following command: pocketsphinx_continuous -infile -keyphrase \ -kws_threshold -time yes The command will print many lines, some of them are keywords with detection times and confidences. You can also disable extra logs with the -logfn your_file.log option to avoid clutter.

From your keyword spotting results count how many false alarms and missed detections you’ve encountered. Select the threshold with the smallest amount of false alarms and missed detections. For the best accuracy it is better to have a keyphrase with 3-4 syllables. Too short phrases are easily confused.