does converting from mulaw to linear impact audio quality?

403 Views Asked by ylvi-bux At 03 January 2022 at 10:45

I want to change audio encoding from mulaw to linear in order to use a linear speech recognition model from Google. I'm using a telephony channel, so audio is encoded in mulaw, 8bits, 8000Hz. When I use Google Mulaw model, there are some issue with recognizing some short single words -> basically they are not recognized at all -> API returns None I was wondering if it is a good practise to change the encoding for Linear or Flac? I already did it, but I cannot really measure the degree of this improvement.

Original Q&A

There are 2 best solutions below

Shipra Sarkar On 04 January 2022 at 09:03

It is always best practice to use either LINEAR16 for headerless audio data or FLAC for headered audio data. They both provide lossless codec. It is good practice to set the sampling rate to 16000 Hz otherwise you can set the sample_rate_hertz to match the native sample rate of the audio source (instead of re-sampling). Since Google Speech to Text API provides various ways to improve the audio quality, you can use World Level Confidence to measure the accuracy for response.

cherba On 30 January 2022 at 13:49

Ideally the audio would be recorded to start with using lossless codec like linear16 ot flac. But once you have it in format like mulaw transcoding it before sending to Google speech-to-text is not helpful.

Consider using model=phone_call and use_enhanced=true for better telephony quality. For quick experimentation you can use STT UI https://cloud.google.com/speech-to-text/docs/ui-overview.

does converting from mulaw to linear impact audio quality?

There are 2 best solutions below

Related Questions in AUDIO

Related Questions in SPEECH-TO-TEXT

Related Questions in GOOGLE-SPEECH-API

Related Questions in GOOGLE-SPEECH-TO-TEXT-API

Related Questions in MU-LAW

Trending Questions

Popular # Hahtags

Popular Questions