Lemmatizer/PoS-tagger for italian in Python

673 Views Asked by sunhearth At 18 October 2022 at 18:42

I'm searching for a Lemmatizer/PoS-tagger for the Italian language, that works on Python. I tried with Spacy, it works but it's not very precise, expecially for verbs it often returns the wrong lemma. NLKT has only english as language. I'm searching for an optimized tool for the Italian language, does it exists? If it doesn't exist, is it possible, given a corpus, to create it? Whats the work needed to create it?

Original Q&A

There are 1 best solutions below

Nicola Fanelli On 25 October 2022 at 17:21

I also found myself into this problem. I found that one of the best italian lemmatizers is TreeTagger. I preferred it to Spacy's lemmatizer for some projects (I also think that it could be better at POS-tagging). You can also test it online to find out if it is ok for your use case.

I found very useful to use it inside my Spacy pipeline, just for lemmatization, to keep the infrastructure that Spacy provides. This is how you can replace Spacy's lemmatizer with TreeTagger in Python thanks to treetaggerwrapper (you could easily do the same with the POS-tagger):

from treetaggerwrapper import TreeTagger
...

nlp = spacy.load("it_core_news_lg")

TREETAGGER = TreeTagger(TAGDIR="path_to_treetagger", TAGLANG="it")

@Language.component("treetagger")
def treetagger(doc):
    tokens = [token.text for token in doc if not token.is_space]

    tags = TREETAGGER.tag_text(tokens, tagonly=True)
    lemmas = [tag.split("\t")[2].split("|")[0] for tag in tags]

    j = 0
    for token in doc:
        if not token.is_space:
            token.lemma_ = lemmas[j]
            j += 1
        else:
            token.lemma_ = " "

    return doc

nlp.replace_pipe("lemmatizer", "treetagger")

This could be a useful temporaneous solution.

Lemmatizer/PoS-tagger for italian in Python

There are 1 best solutions below

Related Questions in NLP

Related Questions in NLTK

Related Questions in SPACY

Related Questions in POS-TAGGER

Related Questions in LEMMATIZATION

Trending Questions

Popular # Hahtags

Popular Questions