I am using SSML, so my app can speak. The app itself works perfectly fine on my phone BUT when I connect my phone with a device over Bluetooth, there is mostly a gap or a delay. Either at the beginning or in the middle of the speech.
So for instance, when the audio is Hello John, I am your assistant. How can I help you?, the output could be sistant. How can I help you?. Sometimes the sentences are fluent but sometimes there are these gaps.
This is how I play the audio file:
String myFile = context.getFilesDir() + "/output.mp3";
mMediaPlayer.reset();
mMediaPlayer.setDataSource(myFile);
mMediaPlayer.prepare();
mMediaPlayer.start();
And this is the entire class of it:
public class Tts {
public Context context;
private final MediaPlayer mMediaPlayer;
public Tts(Context context, MediaPlayer mMediaPlayer) {
this.context = context;
this.mMediaPlayer = mMediaPlayer;
}
@SuppressLint({"NewApi", "ResourceType", "UseCompatLoadingForColorStateLists"})
public void say(String text) throws Exception {
InputStream stream = context.getResources().openRawResource(R.raw.credential); // R.raw.credential is credential.json
GoogleCredentials credentials = GoogleCredentials.fromStream(stream);
TextToSpeechSettings textToSpeechSettings =
TextToSpeechSettings.newBuilder()
.setCredentialsProvider(
FixedCredentialsProvider.create(credentials)
).build();
// Instantiates a client
try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create(textToSpeechSettings)) {
// Replace {name} with target
SharedPreferences sharedPreferences = context.getSharedPreferences("target", Context.MODE_PRIVATE);
String target = sharedPreferences.getString("target", null);
text = (target != null) ? text.replace("{name}", target) : text.replace("null", "");
// Set the text input to be synthesized
String myString = "<speak><prosody pitch=\"low\">" + text + "</prosody></speak>";
SynthesisInput input = SynthesisInput.newBuilder().setSsml(myString).build();
// Build the voice request, select the language code ("en-US") and the ssml voice gender
// ("neutral")
VoiceSelectionParams voice =
VoiceSelectionParams.newBuilder()
.setName("de-DE-Wavenet-E")
.setLanguageCode("de-DE")
.setSsmlGender(SsmlVoiceGender.MALE)
.build();
// Select the type of audio file you want returned
AudioConfig audioConfig =
AudioConfig.newBuilder().setAudioEncoding(AudioEncoding.MP3).build();
// Perform the text-to-speech request on the text input with the selected voice parameters and
// audio file type
SynthesizeSpeechResponse response = textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);
// Get the audio contents from the response
ByteString audioContents = response.getAudioContent();
// Write the response to the output file.
try (FileOutputStream out = new FileOutputStream(context.getFilesDir() + "/output.mp3")) {
out.write(audioContents.toByteArray());
}
String myFile = context.getFilesDir() + "/output.mp3";
mMediaPlayer.setAudioAttributes(new AudioAttributes.Builder().setContentType(AudioAttributes.CONTENT_TYPE_MUSIC).build());
mMediaPlayer.reset();
mMediaPlayer.setDataSource(myFile);
mMediaPlayer.prepare();
mMediaPlayer.setOnPreparedListener(mediaPlayer -> mMediaPlayer.start());
}
}
}
The distance cannot be the reason, since my phone is right next to the device.
Google's SSML needs an internet connection. So I am not quite sure if the gap is because of Bluetooth or internet connection.
So I am trying to close the gap, no matter what the reason is. The audio should be played, when it is prepared and ready to be played.
What I tried
This is what I have tried but I don't hear a difference:
mMediaPlayer.setAudioAttributes(new AudioAttributes.Builder().setContentType(AudioAttributes.CONTENT_TYPE_SPEECH).build());
Instead of mMediaPlayer.prepare(), I also tried it with mMediaPlayer.prepareAsync() but then the audio will not be played (or at least I can't hear it).
Invoking start() in a listener:
mMediaPlayer.setOnPreparedListener(mediaPlayer -> {
mMediaPlayer.start();
});
Unfortunately, the gap is sometimes still there.
Here is my proposed solution. Check out the
// ***comments in the code to see what I changed in respect to your code from the question.Also take it with a grain of salt, because I have no way of testing that right now.
Nevertheless - as far as I can tell - that is all you can do using the MediaPlayer API. If that still doesn't work right for your BlueTooth device, you should try a different BlueTooth device and if that doesn't help either, maybe you can switch the whole thing to use the AudioTrack API instead of MediaPlayer, which gives you a low latency setting and you could use the audio data directly from the response instead of writing it to a file and reading it from there again.
Hope that get's you going!