What I'm going to do: I'd like to generate an mp3 file and timepoint information with Google Text to Speech.
Context: I'm using Java and the google-cloud-texttospeech library in version 2.4.0. Timepoint is v1beta1.Timepoint.
Problem: When I send the SSML string <speak>Hallo <mark name="p1s0"/>Schmetterlings-Arten.</speak> then the time_seconds in Timepoint in the response is 0.
What works: When I exchange the German word "Schmetterlings-Arten" by "Schmetterlingsarten", everything is OK and the time_seconds in Timepoint in the response is 0.419, which is fine.
Additional info: I'm using the voice for "language_code" = "de-DE", "name" = "de-DE-Wavenet-E", "audio_encoding" = "MP3" and "enable_time_pointing" = "SSML_MARK".
Question: Is there something special about a minus character in a word? Do I have to escape it somehow (and if so: how)?
What I tried: I tried some other names for that mark, added some spaces, added some other tags (e.g. START mark, END mark, some breaks, ...), tried to escape the minus character with a backslash, .... Nothing of that changed the result.
As of now, this is an expected behavior when using any German language from the supported voices and languages of text-to-speech. The team for Cloud Text-to-Speech product is constantly making changes to try to improve issues like this.
Possible workaround for now is to pre-process your input SSML and remove the hyphen(-) by using
REGEX REPLACEsimilar to below sample using java.And then pass the updated variable to the text-to-speech's input SSML of your request body.
In addition, you may also file a bug for this specific issue on German language and star it if you want to be notified regarding the filed bug.