I want to change audio encoding from mulaw to linear in order to use a linear speech recognition model from Google. I'm using a telephony channel, so audio is encoded in mulaw, 8bits, 8000Hz. When I use Google Mulaw model, there are some issue with recognizing some short single words -> basically they are not recognized at all -> API returns None I was wondering if it is a good practise to change the encoding for Linear or Flac? I already did it, but I cannot really measure the degree of this improvement.
does converting from mulaw to linear impact audio quality?
403 Views Asked by ylvi-bux At
2
There are 2 best solutions below
0
cherba
On
Ideally the audio would be recorded to start with using lossless codec like linear16 ot flac. But once you have it in format like mulaw transcoding it before sending to Google speech-to-text is not helpful.
Consider using model=phone_call and use_enhanced=true for better telephony quality.
For quick experimentation you can use STT UI https://cloud.google.com/speech-to-text/docs/ui-overview.
Related Questions in AUDIO
- how to play a sounds in c# forms?
- Winsound not working isn't working at all
- Ringing noise overpowering voice / Recording audio with Max 9814 microphone on Raspberry pi pico using ADC Pin / Circuitpython
- How to take first x seconds of Audio from a wav file read from AWS S3 as binary stream using Python?
- gluon attach audio doesn't play any sound on android
- Implementing trim and fade filters with ffmpeg - MP3
- Unable to set device connection state as INPUT device type is none
- Is there a way to differentiate music and talking from a video?
- How to concatenate audio tracks and make them start a certain moment using Python?
- Combine two audio in different languages to one natural sounding
- STM32 - Serial Audio Interface (SAI) - dual data line transmit possible?
- playing mp3 downloaded via curllib gets cut short
- How to stream PCM audio to a speakers both on mac and linux in Node.js?
- Scikit-Maad -From the function rois.find_rois_cwt, I want to get a csv of the outputs so I can do my own analysis on it
- Using MediaPlayer slows down SoundPool sound effect
Related Questions in SPEECH-TO-TEXT
- How to Avoid Speech Recognition from Recognizing Speaker Playback in Unity
- recognize_google fails with WinError 10060
- React native voice isn't detecting my voice
- Try to run flutter app after install speech-to-text package in my flutter project
- Unable to convert Speech to Text using Azure Speech-to-Text service
- Automatic speech recognition from scratch
- google speech transcribe-streaming-audio with single_utterance and time limit
- How to get the microphone to record sound with Google Speech recognition on Raspberry Pi 3?
- How to increase the time for which the Microsoft Speech Service SDK listens in a single go?
- AttributeError: module 'speech_recognition' has no attribute 'Microphone'
- Kotlin Speech Recognition Without Google Api or any pop ups
- Is there a way to change number words to numeric numbers between other text in a string in python?
- Azure speech to text with identification error 'Activation Phrase is not matched'
- Python SpeechRecognition having trouble processing short pronounced words
- Why doesn't SpeechSynthesizer work when using SpeechRecognizer?
Related Questions in GOOGLE-SPEECH-API
- Speech-to-Text API documentation question
- Python Google Speech v1 voice_activity_timeouts error
- how to send a post request to speech-to-text v2
- Kotlin Speech Recognition Without Google Api or any pop ups
- Speech-to-text api polling timeout with LRO files on google cloud storage
- Google Speech to Text in French very slow
- Getting 403 Permission from Google Speech API - but only from Docker
- How do I add iam.serviceAccounts.getAccessToken permission to a service account?
- Google Cloud Speech to Text API not working on Python
- How to transcribe audio from microphone using google speech API V2 in nodeJs?
- How to implement speech recognition with the Google Cloud Speech-to-Text API using Node.js?
- Parsing the gRPC LocationMetadata of a java google.cloud.speech.v2 listLocations request
- Google Speech-To-Text Cannnot Process AMR audio?
- Use google-speech-to-text with API key in frontend javascript
- Google Cloud Speech-to-Text Automatic Punctuation
Related Questions in GOOGLE-SPEECH-TO-TEXT-API
- Python Google Speech v1 voice_activity_timeouts error
- how to send a post request to speech-to-text v2
- google speech transcribe-streaming-audio with single_utterance and time limit
- Google Speech to Text in French very slow
- Authentication Error when connecting GCP with Spring Boot
- Google Speech-to-Text v2 Context/Hints Phrase Didn't Help for Homophone
- Should 'r.pause_threshold' be defined one time or for every listen?
- Google Cloud Speech to Text API not working on Python
- How to transcribe audio from microphone using google speech API V2 in nodeJs?
- How to check asynchronously Google speech API output stream transcriptions in Ruby?
- Google Speech-To-Text Cannnot Process AMR audio?
- Google Cloud Speech-to-Text Automatic Punctuation
- expo + google speech-to-text transcription
- google speech-to-text apvi2 error: Invalid `language_codes`: field must be non-empty
- Google Speech To Text Dialog For IOS (Flutter)
Related Questions in MU-LAW
- Send OpenAI Text To Speech Wav stream to Twilio stream
- Write a numpy array to headerless wav file?
- Create Gstreamer pipeline send audio Mulaw encode via UDP host and port
- play twilio call media stream directly in the browser
- How to convert PCMU to WAV
- Clarification about μ-law companding
- I'm trying to convert ulaw PCM to Linear PCM in Java
- Speaker outputs bursts of static when running Node.js WebSocket server with “audio/x-mulaw” data from Twilio
- Audio written to Twilio websocket in x-audio/mulaw 8kHz is garbage
- Converting a real-time MP3 audio stream to 8000/mulaw in Python
- Convert aws polly synthesizeSpeech response to twilio mulaw format in NodeJs
- Converting Twilio mu-law 8Hz for real-time playback with Discord.js
- Google tts Invalid encoding type only for MULAW audioEncoding
- Nodejs/c++ addon - getting error "undefined symbol: speech_config_from_subscription" from Microsoft speech SDK on ubuntu 18.4 server
- NAudio SampleProvider for MuLaw encoded audio files
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
It is always best practice to use either LINEAR16 for headerless audio data or FLAC for headered audio data. They both provide lossless codec. It is good practice to set the sampling rate to 16000 Hz otherwise you can set the sample_rate_hertz to match the native sample rate of the audio source (instead of re-sampling). Since Google Speech to Text API provides various ways to improve the audio quality, you can use World Level Confidence to measure the accuracy for response.