Remove list of word from string using python

170 Views Asked by At

I am trying to remove a list of words from a string using python. I tried the code below, but it is adding space while replacing words in the string. Is there any approach which helps to remove only the word present in the list of words? Please give me some advice.

words_to_remove=['gosh', 'no', 'oh', 'Yep', 'ow', 'well', 'goodness', 'Yeah']

test_data = """RegExr Yeah was created by gskinner.com.
yippe, ow, ouch, gosh Yeah oh, goodness, oh well, oh no, how can I do wonders in this world. Yep, it is out of the world.
Edit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.
The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.
Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.
"""

# Remove words
for word in words_to_remove:
    test_data = test_data.replace(word, '')

test_data
Out[46]: 'RegExr  was created by gskinner.com.\nyippe, , ouch,   , ,  ,  , h can I do wonders in this world. , it is out of the world.\nEdit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\nThe side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\nExplore results with the Tools bel. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.\n'
4

There are 4 best solutions below

1
吴慈霆 On
words_to_remove = ['gosh', 'no', 'oh', 'Yep', 'ow', 'well', 'goodness', 'Yeah']

test_data = """RegExr Yeah was created by gskinner.com.
yippe, ow, ouch, gosh Yeah oh, goodness, oh well, oh no, how can I do wonders in this world. Yep, it is out of the world.
Edit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.
The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.
Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.
"""

splitted = test_data.split(' ')
filtered = list(filter(lambda word: word not in words_to_remove, splitted))
print(' '.join(filtered))

2
sahasrara62 On

string are immutable, so dont use replace and keep making new list again adn again

words_to_remove=set(['gosh', 'no', 'oh', 'Yep', 'ow', 'well', 'goodness', 'Yeah'])

test_data = """RegExr Yeah was created by gskinner.com.
yippe, ow, ouch, gosh Yeah oh, goodness, oh well, oh no, how can I do wonders in this world. Yep, it is out of the world.
Edit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.
The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.
Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.
"""
new_data = ' '.join(i for i in test_data.split() if (i and i not in words_to_remove))
print(new_data)

output

RegExr was created by gskinner.com. yippe, ow, ouch, oh, goodness, well, no, how can I do wonders in this world. Yep, it is out of the world. Edit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode. The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns. Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.
1
Chris On

If you just want to remove the offending words, you might use regular expressions and compile a pattern from your list of words to rmeove.

>>> r = re.compile(rf"\b(?:{'|'.join(words_to_remove)})\b")
>>> r.sub('', test_data)
'RegExr  was created by gskinner.com.\nyippe, , ouch,   , ,  ,  , how can I do wonders in this world. , it is out of the world.\nEdit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\nThe side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\nExplore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.\n'

Now this clearly doesn't address the excessive punctuation, but your can probably solve that with a regex. An initial take that you can likely improve.

>>> re.sub(r'([,.:;?]\s?)[\s,.:;?]*', r'\1', r.sub('', test_data))
'RegExr  was created by gskinner.com.\nyippe, ouch, how can I do wonders in this world. it is out of the world.\nEdit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.\nThe side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.\nExplore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.\n'
0
Arifa Chan On

You can try using strip(',') for each word if it's not in words_to_remove

words_to_remove=['gosh', 'no', 'oh', 'Yep', 'ow', 'well', 'goodness', 'Yeah']

test_data = """RegExr Yeah was created by gskinner.com.
yippe, ow, ouch, gosh Yeah oh, goodness, oh well, oh no, how can I do wonders in this world. Yep, it is out of the world.
Edit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.
The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.
Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.
"""

# Remove words
test_data = ' '.join(filter(lambda i: i.strip(',') not in words_to_remove, test_data.split(' ')))

print(test_data)

Output:

RegExr was created by gskinner.com.
yippe, ouch, how can I do wonders in this world. it is out of the world.
Edit the Expression & Text, to-see matches. Roll; over$ matches% or* the expr@ession for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.
The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community and view patterns you create or favorite in My Patterns.
Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.