How to stem tokens using list comprehension?

85 Views Asked by At

I extracted a series of texts from an xml file (with BeautifulSoup) storing them in a list of strings(each string is a text). Now I want to modify that list of strings with list comprehension so that it becomes a list of lists where each list-item contains the lowered stemmed words of the text without punctuation.

The problem is threefold:

a) I can't remove the " " element (I tried using if word != " " but did not have any effect)

b) when I use the string library to remove punctuation things like 26-year-old turn into 26yearold. How can I avoid that while removing punctuation (with string)

c) wasn't turn into wasnt

This is the list that I am storing everything. I want to remove the " " element and find a way to parse better the phrases with "-"

list_of_texts = [[stem(word.lower().translate (word.maketrans('', '', string.punctuation)).replace("\n",  " "))  for word in text.split()] for text in list_of_texts]
0

There are 0 best solutions below