Google Text to Speech: word with minus character returns Timepoint second = 0

381 Views Asked by At

What I'm going to do: I'd like to generate an mp3 file and timepoint information with Google Text to Speech.

Context: I'm using Java and the google-cloud-texttospeech library in version 2.4.0. Timepoint is v1beta1.Timepoint.

Problem: When I send the SSML string <speak>Hallo <mark name="p1s0"/>Schmetterlings-Arten.</speak> then the time_seconds in Timepoint in the response is 0.

What works: When I exchange the German word "Schmetterlings-Arten" by "Schmetterlingsarten", everything is OK and the time_seconds in Timepoint in the response is 0.419, which is fine.

Additional info: I'm using the voice for "language_code" = "de-DE", "name" = "de-DE-Wavenet-E", "audio_encoding" = "MP3" and "enable_time_pointing" = "SSML_MARK".

Question: Is there something special about a minus character in a word? Do I have to escape it somehow (and if so: how)?

What I tried: I tried some other names for that mark, added some spaces, added some other tags (e.g. START mark, END mark, some breaks, ...), tried to escape the minus character with a backslash, .... Nothing of that changed the result.

1

There are 1 best solutions below

1
Scott B On

As of now, this is an expected behavior when using any German language from the supported voices and languages of text-to-speech. The team for Cloud Text-to-Speech product is constantly making changes to try to improve issues like this.

Possible workaround for now is to pre-process your input SSML and remove the hyphen(-) by using REGEX REPLACE similar to below sample using java.

String original_SSML = "<speak>Hallo <mark name='p1s0'/>Schmetterlings-Arten.</speak>";
String updated_SSML = original_SSML.replace("-", "");

And then pass the updated variable to the text-to-speech's input SSML of your request body.

In addition, you may also file a bug for this specific issue on German language and star it if you want to be notified regarding the filed bug.