does converting from mulaw to linear impact audio quality?

403 Views Asked by At

I want to change audio encoding from mulaw to linear in order to use a linear speech recognition model from Google. I'm using a telephony channel, so audio is encoded in mulaw, 8bits, 8000Hz. When I use Google Mulaw model, there are some issue with recognizing some short single words -> basically they are not recognized at all -> API returns None I was wondering if it is a good practise to change the encoding for Linear or Flac? I already did it, but I cannot really measure the degree of this improvement.

2

There are 2 best solutions below

0
Shipra Sarkar On

It is always best practice to use either LINEAR16 for headerless audio data or FLAC for headered audio data. They both provide lossless codec. It is good practice to set the sampling rate to 16000 Hz otherwise you can set the sample_rate_hertz to match the native sample rate of the audio source (instead of re-sampling). Since Google Speech to Text API provides various ways to improve the audio quality, you can use World Level Confidence to measure the accuracy for response.

0
cherba On

Ideally the audio would be recorded to start with using lossless codec like linear16 ot flac. But once you have it in format like mulaw transcoding it before sending to Google speech-to-text is not helpful.

Consider using model=phone_call and use_enhanced=true for better telephony quality. For quick experimentation you can use STT UI https://cloud.google.com/speech-to-text/docs/ui-overview.