This is my first StackOverflow post, and I'm not very good at python, so I apologize if I get confused about anything...
I want to make a mod for a game that imports lip syncing to a face it has, but to do that I need to figure out the timings of the different sounds in the 1000+ voicelines. (The game is called "Will You Snail" by the way, it's a great game.) I'm wondering if I can do this in a python script, and if so, how? I am an amatuer at python so I'm crossing my fingers it's something I'll be able to do with the help of an existing library. I just want to know if it's possible to do with a library and if so how hard is it.
So basically I would need to know A) What sounds the voice made B) What time in the audio file it started and stopped each individual sound
Thanks!
EDIT: Somebody asked me to narrow down what the problem is, so here goes: I'm wondering if there's a python library out there that can take an audio file with someone speaking and output a list of the times each human sound started and stopped. A human sound could be things like "aah, ee, ooh, oo, ph, sh, ch," etc. Basically the basic noises we use to talk. I also need to know WHAT sound was made along with when.