I'm testing an app based on the demo avaialble on GitHub using the Spanish language model in which I want it to continuously listen for a small set of keywords and act accordingly, however I'm still an amateur on this subject. My main questions right now are the following:
Given my current setupRecognizer method
private void setupRecognizer(File assetsDir) throws IOException {
recognizer = SpeechRecognizerSetup.defaultSetup()
.setAcousticModel(new File(assetsDir, "es-ptm"))
.setDictionary(new File(assetsDir, "es.dict"))
.setRawLogDir(assetsDir)
.getRecognizer();
recognizer.addListener(this);
File actionGrammar = new File(assetsDir, "actions.list");
recognizer.addKeywordSearch(SEARCH, actionGrammar);
File languageModel = new File(assetsDir, "es_model.lm");
recognizer.addNgramSearch(SEARCH, languageModel);
startSearch(SEARCH);
}
What happens by adding both addKeywordSearch and addNGramSearch, under the same identifier string ("SEARCH" in my code)? Am I improving the recognition or making it worse?
In a desperate attempt, I reduced the dictionary to only the words I want to be recognized, such as this:
atrás a t r a s
listo l i s t o
listo(2) l i s t a
listo(3) l i s t a s
listos(4) l i s t o s
repetir rr e p e t i r
repetir(2) rr e p e t i d o
repetirse(3) rr e p e t i r s e
It is now reduced to only recognizing these words, but it misbehaves a lot, identifying words I didn't say. I'm guessing PocketSphinx is probability-based and since I reduced the dictionary these words have high probability of being recognized. Am I correct?
Also in an attempt to improve my accuracy, I made this actions.list
listo /1.0/
atrás /1.0/
repetir /1.0/
Although I'm not really sure what this value means. It says on the documentation to use 1e-1 for smaller words, and increase to 1e-50 for bigger words. What notation is this and what does it mean?
I'm really concerned about making it as accurate as possible, am I on the right path?
Thanks in advance!
The ngram search replaces keyword search, keyword search is garbage collected
What is E in floating point?