Made a word classifier with nlpnet (http://nilc.icmc.usp.br/nlpnet/index.html). the goal is to extract only words individually with given tagger.
response code
import nlpnet
import codecs
import itertools
TAGGER = nlpnet.POSTagger('pos-pt', language='pt')
def TAGGER_txt(text):
return (list(TAGGER.tag(text)))
with codecs.open('document.txt', encoding='utf8') as original_file:
with codecs.open('document_teste.txt', 'w') as output_file:
for line in original_file.readlines():
print (line)
words = TAGGER_txt(line)
all_words = list(itertools.chain(*words))
nouns = [word[0] for word in all_words if word[1]=='V']
print (nouns)
Result
O gato esta querendo comer o ratão
['gato', 'ratão']
I think this could be the essence of what you need. Please see edited version.
As you say in your question, the result of tagging
Sentencewould be something liketagged. If you wanted just the nouns fromSentenceyou could recover them using the expression afternouns =.Output:
Edit: It's not clear to me what you want. Here's another possibility.
codecs.open..
Output: