Gaps in audio when connecting to a Bluetooth device

305 Views Asked by At

I am using SSML, so my app can speak. The app itself works perfectly fine on my phone BUT when I connect my phone with a device over Bluetooth, there is mostly a gap or a delay. Either at the beginning or in the middle of the speech.

So for instance, when the audio is Hello John, I am your assistant. How can I help you?, the output could be sistant. How can I help you?. Sometimes the sentences are fluent but sometimes there are these gaps.

This is how I play the audio file:

String myFile = context.getFilesDir() + "/output.mp3";
mMediaPlayer.reset();
mMediaPlayer.setDataSource(myFile);
mMediaPlayer.prepare();
mMediaPlayer.start();

And this is the entire class of it:

public class Tts {
    public Context context;
    private final MediaPlayer mMediaPlayer;

    public Tts(Context context, MediaPlayer mMediaPlayer) {
        this.context = context;
        this.mMediaPlayer = mMediaPlayer;
    }

    @SuppressLint({"NewApi", "ResourceType", "UseCompatLoadingForColorStateLists"})
    public void say(String text) throws Exception {
        InputStream stream = context.getResources().openRawResource(R.raw.credential); // R.raw.credential is credential.json
        GoogleCredentials credentials = GoogleCredentials.fromStream(stream);
        TextToSpeechSettings textToSpeechSettings =
                TextToSpeechSettings.newBuilder()
                        .setCredentialsProvider(
                                FixedCredentialsProvider.create(credentials)
                        ).build();


        // Instantiates a client
        try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create(textToSpeechSettings)) {

            // Replace {name} with target
            SharedPreferences sharedPreferences = context.getSharedPreferences("target", Context.MODE_PRIVATE);
            String target = sharedPreferences.getString("target", null);
            text = (target != null) ? text.replace("{name}", target) : text.replace("null", "");

            // Set the text input to be synthesized
            String myString = "<speak><prosody pitch=\"low\">" + text + "</prosody></speak>";
            SynthesisInput input = SynthesisInput.newBuilder().setSsml(myString).build();

            // Build the voice request, select the language code ("en-US") and the ssml voice gender
            // ("neutral")
            VoiceSelectionParams voice =
                    VoiceSelectionParams.newBuilder()
                            .setName("de-DE-Wavenet-E")
                            .setLanguageCode("de-DE")
                            .setSsmlGender(SsmlVoiceGender.MALE)
                            .build();

            // Select the type of audio file you want returned
            AudioConfig audioConfig =
                    AudioConfig.newBuilder().setAudioEncoding(AudioEncoding.MP3).build();

            // Perform the text-to-speech request on the text input with the selected voice parameters and
            // audio file type
            SynthesizeSpeechResponse response = textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);

            // Get the audio contents from the response
            ByteString audioContents = response.getAudioContent();

            // Write the response to the output file.
            try (FileOutputStream out = new FileOutputStream(context.getFilesDir() + "/output.mp3")) {
                out.write(audioContents.toByteArray());
            }

            String myFile = context.getFilesDir() + "/output.mp3";
            mMediaPlayer.setAudioAttributes(new AudioAttributes.Builder().setContentType(AudioAttributes.CONTENT_TYPE_MUSIC).build());
            mMediaPlayer.reset();
            mMediaPlayer.setDataSource(myFile);
            mMediaPlayer.prepare();
            mMediaPlayer.setOnPreparedListener(mediaPlayer -> mMediaPlayer.start());
        }
    }
}

The distance cannot be the reason, since my phone is right next to the device.

Google's SSML needs an internet connection. So I am not quite sure if the gap is because of Bluetooth or internet connection.

So I am trying to close the gap, no matter what the reason is. The audio should be played, when it is prepared and ready to be played.

What I tried

This is what I have tried but I don't hear a difference:

mMediaPlayer.setAudioAttributes(new AudioAttributes.Builder().setContentType(AudioAttributes.CONTENT_TYPE_SPEECH).build());

Instead of mMediaPlayer.prepare(), I also tried it with mMediaPlayer.prepareAsync() but then the audio will not be played (or at least I can't hear it).


Invoking start() in a listener:

mMediaPlayer.setOnPreparedListener(mediaPlayer -> {
    mMediaPlayer.start();
});

Unfortunately, the gap is sometimes still there.

1

There are 1 best solutions below

3
Frank On

Here is my proposed solution. Check out the // *** comments in the code to see what I changed in respect to your code from the question.

Also take it with a grain of salt, because I have no way of testing that right now.

Nevertheless - as far as I can tell - that is all you can do using the MediaPlayer API. If that still doesn't work right for your BlueTooth device, you should try a different BlueTooth device and if that doesn't help either, maybe you can switch the whole thing to use the AudioTrack API instead of MediaPlayer, which gives you a low latency setting and you could use the audio data directly from the response instead of writing it to a file and reading it from there again.

public class Tts {
    public Context context;
    private final MediaPlayer mMediaPlayer;

    public Tts(Context context, MediaPlayer mMediaPlayer) {
        this.context = context;
        this.mMediaPlayer = mMediaPlayer;
    }

    @SuppressLint({"NewApi", "ResourceType", "UseCompatLoadingForColorStateLists"})
    public void say(String text) throws Exception {
        InputStream stream = context.getResources().openRawResource(R.raw.credential); // R.raw.credential is credential.json
        GoogleCredentials credentials = GoogleCredentials.fromStream(stream);
        TextToSpeechSettings textToSpeechSettings =
                TextToSpeechSettings.newBuilder()
                        .setCredentialsProvider(
                                FixedCredentialsProvider.create(credentials)
                        ).build();


        // Instantiates a client
        try (TextToSpeechClient textToSpeechClient = TextToSpeechClient.create(textToSpeechSettings)) {

            // Replace {name} with target
            SharedPreferences sharedPreferences = context.getSharedPreferences("target", Context.MODE_PRIVATE);
            String target = sharedPreferences.getString("target", null);
            text = text.replace("{name}", (target != null) ? target : ""); // *** bug fixed

            // Set the text input to be synthesized
            String myString = "<speak><prosody pitch=\"low\">" + text + "</prosody></speak>";
            SynthesisInput input = SynthesisInput.newBuilder().setSsml(myString).build();

            // Build the voice request, select the language code ("en-US") and the ssml voice gender
            // ("neutral")
            VoiceSelectionParams voice =
                    VoiceSelectionParams.newBuilder()
                            .setName("de-DE-Wavenet-E")
                            .setLanguageCode("de-DE")
                            .setSsmlGender(SsmlVoiceGender.MALE)
                            .build();

            // Select the type of audio file you want returned
            AudioConfig audioConfig =
                    AudioConfig.newBuilder().setAudioEncoding(AudioEncoding.MP3).build();

            // Perform the text-to-speech request on the text input with the selected voice parameters and
            // audio file type
            SynthesizeSpeechResponse response = textToSpeechClient.synthesizeSpeech(input, voice, audioConfig);

            // Get the audio contents from the response
            ByteString audioContents = response.getAudioContent();

            // Write the response to the output file.
            try (FileOutputStream out = new FileOutputStream(context.getFilesDir() + "/output.mp3")) {
                out.write(audioContents.toByteArray());
            }

            String myFile = context.getFilesDir() + "/output.mp3";
            mMediaPlayer.reset();
            mMediaPlayer.setDataSource(myFile);
            mMediaPlayer.setAudioAttributes(new AudioAttributes.Builder() // *** moved here (should be done before prepare and very likely AFTER reset)
                    .setContentType(AudioAttributes.CONTENT_TYPE_SPEECH)  // *** changed to speech
                    .setUsage(AudioAttributes.USAGE_ASSISTANT)            // *** added
                    .setFlags(AudioAttributes.FLAG_AUDIBILITY_ENFORCED)   // *** added
                    .build());
            mMediaPlayer.prepare();
            // *** following line changed since handler was defined AFTER prepare and
            // *** the prepare call isn't asynchronous, thus the handler would never be called.
            mMediaPlayer.start();
        }
    }
}

Hope that get's you going!