NER - Extract long entities - voice chatbot

135 Views Asked by zakaria hamdane At 26 December 2022 at 00:51

Building a voice Chatbot to do some specific tasks (intents), e.g translation,
Issue is I m having long entities:
input from user: "translate to German The Eminem Show 20th Anniversary launched earlier this year" I need to extract following entities:

("German", "LanguageTo")
("The Eminem Show 20th Anniversary launched earlier this year", "text")

I tried using Spacy to train custom ner, but it is doing bad on long entities (not catching the whole "text" entity), "CRF" and "DIETClassifier" within Rasa are better, but not really good,

Do you think extracting the long "text" entity is not a NER task? Any recommendations I would be delighted!

NB: text I m getting from the user (as it is a voice chatbot) has no punctuation nor casing (full text is lowercase) and could be much longer than the example I gave

Original Q&A

There are 2 best solutions below

polm23 On 26 December 2022 at 02:53

You're right that this isn't really an NER problem - while in the most general sense NER covers any selection of text from input, many NER models are designed for short proper nouns. A side effect of that is that they're sensitive to where the spans start and end, and have trouble representing long spans.

In the case of spaCy, the spancat component was designed to have less edge sensitivity, and should be a better fit for problems like the one you have. It's still kind of a difficult problem, but should do better than NER.

Backing up a bit, you might want to consider whether you actually need to use a model to find things like the language to translate to - you could just use a list of languages, for example. You could also have an inflexible command structure if you have a small number of well-defined commands.

Y G On 28 December 2022 at 08:48

I would recommend you use whisper from openAi. It adds automatically punctuation when fit and thus you could likely do the entity/text separation. You could also use POS tagging from spacy to detect parts of your speech and extract language.

NER - Extract long entities - voice chatbot

There are 2 best solutions below

Related Questions in SPACY

Related Questions in NAMED-ENTITY-RECOGNITION

Related Questions in RASA

Related Questions in CRF

Trending Questions

Popular # Hahtags

Popular Questions