I'm trying to lemmatize german texts which are in a dataframe.
I use german library to succesfully handle with specific grammatic structure: https://github.com/jfilter/german-preprocessing
My code:
from german import preprocess
df = pd.read_csv('Afd.csv', sep=',')
Lemma = open('MessageAFD_lemma.txt', 'w')
for i in df['message']:
preprocess (i, remove_stop=True)
Lemma.write(i)
Lemma.close()
The process of lemmatization goes successfully, there's no any error in the terminal, but openning the file "MessageAFD_lemma.txt", I get this : (nothing was lemmatized)
The expected result is like:
Input:
preprocess(['Johpannes war einer von vielen guten Schülern.', 'Julia trinkt gern Tee.'], remove_stop=True)
Output:
['johannes gut schüler', 'julia trinken tee']
What goes wrong?
The
preprocessfunction returns a copy of the texts, instead of modifying the input. So you need to write the result ofpreprocessto the file, not the originalimessages.Furthermore,
preprocessaccepts a list of texts to process, so you must wrap your message in[message], and extract the single result from the returned list withresult, = ...