Torch error and warnings. (TTS code with huggingface model.)

98 Views Asked by Milos Cecaric At 15 January 2024 at 14:30

I have python code:

from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
from datasets import load_dataset
import torch
import torchaudio
import soundfile as sf
import speechbrain as sb
from speechbrain.pretrained import SpeakerRecognition

processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")

text_input = input("Enter text in English: ")

inputs = processor(text=text_input, return_tensors="pt")

spk_rec = SpeakerRecognition.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb", savedir="pretrained_models/spkrec-ecapa-voxceleb")

embeddings_dataset = load_dataset("vctk", trust_remote_code=True)

wav = sb.dataio.dataio.read_audio(embeddings_dataset['train'][5841]['file'])

speaker_embeddings = spk_rec.encode_batch(wav.unsqueeze(0))

speaker_embeddings = speaker_embeddings.squeeze(0)

num_tokens = inputs["input_ids"].shape[1]

speaker_embeddings = speaker_embeddings.unsqueeze(1).unsqueeze(2).unsqueeze(3).unsqueeze(4).expand(1, 1, 1, 1, num_tokens, 192)

spectrogram = model.generate_speech(inputs["input_ids"], speaker_embeddings)

with torch.no_grad():
    speech = vocoder(spectrogram)

sf.write("output.wav", speech.numpy(), samplerate=16000)

but I have error and warnings. If I run code first I have warnings:

The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
C:\Users\mceca\AppData\Roaming\Python\Python310\site-packages\speechbrain\utils\torch_audio_backend.py:22: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
C:\Users\mceca\AppData\Roaming\Python\Python310\site-packages\speechbrain\utils\torch_audio_backend.py:22: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend("soundfile")
C:\Users\mceca\Desktop\py.py:9: UserWarning: torchaudio._backend.set_audio_backend has been deprecated. With dispatcher enabled, this function is no-op. You can remove the function call.
  torchaudio.set_audio_backend('soundfile')

next when I enter text, I have error:

Traceback (most recent call last):
  File "C:\Users\mceca\Desktop\py.py", line 33, in <module>
    with torch.no_grad():
  File "C:\Users\mceca\AppData\Roaming\Python\Python310\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\mceca\AppData\Roaming\Python\Python310\site-packages\transformers\models\speecht5\modeling_speecht5.py", line 2921, in generate_speech
    return _generate_speech(
  File "C:\Users\mceca\AppData\Roaming\Python\Python310\site-packages\transformers\models\speecht5\modeling_speecht5.py", line 2521, in _generate_speech
    decoder_hidden_states = model.speecht5.decoder.prenet(output_sequence, speaker_embeddings)
  File "C:\Users\mceca\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\mceca\AppData\Roaming\Python\Python310\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\mceca\AppData\Roaming\Python\Python310\site-packages\transformers\models\speecht5\modeling_speecht5.py", line 700, in forward
    speaker_embeddings = speaker_embeddings.expand(-1, inputs_embeds.size(1), -1)
RuntimeError: expand(torch.FloatTensor{[1, 1, 1, 1, 1, 8, 192]}, size=[-1, 1, -1]): the number of sizes provided (3) must be greater or equal to the number of dimensions in the tensor (7)

This is my first time to work with torch, torchaudio and huggingface TTS model. Please, modify my code and describe your changes (not only describe).

Original Q&A

Torch error and warnings. (TTS code with huggingface model.)

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in PYTORCH

Related Questions in HUGGINGFACE-DATASETS

Trending Questions

Popular # Hahtags

Popular Questions