I've been trying for weeks to create a conversational model based on transformers. I've tried this three different examples:
- Keras english to spanish translation
- How To Create A Chatbot With Transformers
- Text generation using FNet
In all cases, if I use the english to spanish dataset the model converges correctly but when I try to use a dialogs dataset (I've tried several including the Cornell's movie corpus) the result is always the same. The loss stays over 4 or 5 and the model never converges.
I've tried the models as shown in the examples and also tried many different configurations changing the number of samples (from 1000 to 100000), batch size (from 1 to 64), initial learning rate and lronplateau, optimizers, number of epochs but the results stay always the same. After 10 or 20 epochs loss and accuracy (both on training and validation) level and the model stops learning.
What am I doing wrong? Are those examples really working for someone? Is a sequential model with sparse categorical crossentropy and accuracy metrics the right model for conversations?
Thanks