Separating User Speech from Chatbot Speech in Real-Time Audio Streams with Twilio and Google Speech-to-Text

140 Views Asked by Waleed Farrukh At 13 August 2023 at 04:12

I'm currently working on an application that involves a chatbot interacting with user through twilio media streams. One of the primary challenges I'm encountering is accurately differentiating user speech from chatbot speech within the incoming audio stream. The goal is to ensure that certain actions are triggered exclusively during user speech, while avoiding interference during chatbot responses.

Currently, I'm using Google Speech-to-Text to transcribe user speech, but the quality of the translation deteriorates when the chatbot's voice is present in the audio stream with user's voice. My goal is to improve translation accuracy by either isolating the chatbot's voice from the audio bytes before translation or by preventing certain actions when the chatbot is speaking.

ref:

incoming_text = await get_text_from_audio(incoming_speech)

incoming_speech is in bytes.

Tech Stack:

Twilio for voice interactions
Google Speech-to-Text for transcription
Text-to-speech for chatbot responses

Current Approach:

User speech is sent to Google Speech-to-Text for translation. However, chatbot voice with user voice in the audio stream degrades translation accuracy. I'm using FastAPI for application development.

Specific Challenges:

How can I remove the chatbot's voice from the audio bytes before sending them to Google Speech-to-Text to ensure accurate transcription? Alternatively, how can I prevent the execution of specific actions when the chatbot is speaking? May be I should detect twilio voice frequency and then filter it out.

Original Q&A

Separating User Speech from Chatbot Speech in Real-Time Audio Streams with Twilio and Google Speech-to-Text

There are 0 best solutions below

Related Questions in TWILIO

Related Questions in SPEECH-RECOGNITION

Related Questions in TEXT-TO-SPEECH

Related Questions in VOICE-RECOGNITION

Related Questions in GOOGLE-SPEECH-TO-TEXT-API

Trending Questions

Popular # Hahtags

Popular Questions