Extracting words/phrase followed by a phrase

592 Views Asked by At

I have one text file with a list of phrases. Below is how the file looks:

Filename: KP.txt

enter image description here

And from the below input (paragraph), I want to extract the next 2 words after the KP.txt phrase (the phrases could be anything as shown in my above KP.txt file). All I need is to extract the next 2 words.

Input:

This is Lee. Thanks for contacting me. I wanted to know the exchange policy at Noriaqer hardware services.

In the above example, I found the phrase " I wanted to know", matches with the KP.txt file content. So if I wanted to extract the next 2 words after this, my output will be like "exchange policy".

How could I extract this in python?

3

There are 3 best solutions below

0
dreamzboy On BEST ANSWER

Assuming you already know how to read the input file into a list, it can be done with some help from regex.

>>> wordlist = ['I would like to understand', 'I wanted to know', 'I wish to know', 'I am interested to know']
>>> input_text = 'This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.'
>>> def word_extraction (input_text, wordlist):
...     for word in wordlist:
...         if word in input_text:
...             output = re.search (r'(?<=%s)(.\w*){2}' % word, input_text)
...             print (output.group ().lstrip ())
>>> word_extraction(input_text, wordlist)
exchange policy
>>> input_text = 'This is Lee. Thanks for contacting me. I wish to know where is Noriaqer hardware.'
>>> word_extraction(input_text, wordlist)
where is
>>> input_text = 'This is Lee. Thanks for contacting me. I\'d like to know where is Noriaqer hardware.'
>>> word_extraction(input_text, wordlist)

>>>
  1. First we need to check whether the phrase we want is in the sentence. It's not the most efficient way if you have large list but it works for now.
  2. Next if it is in our "dictionary" of phrase, we use regex to extract the keyword that we wanted.
  3. Finally strip the leading white space in front of our target word.

Regex Hint:

  • (?<=%s) is look behind assertion. Meaning check the word behind the sentence starting with "I wanted to know"
  • (.\w*){2} means any character after our phrase followed by one or more words stopping at 2 words after the key phrase.
5
Kunal Sharma On

you can use this:

with open("KP.txt") as fobj:
    phrases = list(map(lambda sentence : sentence.lower().strip(), fobj.readlines()))

paragraph = input("Enter The Whole Paragraph in one line:\t").lower()

for phrase in phrases:
    if phrase in paragraph:
        temp = paragraph.split(phrase)[1:]
        for clause in temp:
            print(" ".join(clause.split()[:2]))

0
Mahmoud Aly On

I Think natural language processing could be a better solution, but this code would help :)

def search_in_text(kp,text):
    for line in kp:
        #if a search phrase found in kp lines
        if line in text:
            #the starting index of the two words
            i1=text.find(line)+len(line)
            #the end index of the following two words (first index+50 at maximum)
            i2=(i1+50) if len(text)>(i1+50) else len(text)
            #split the following text to words (next_words) and remove empty spaces
            next_words=[word for word in text[i1:i2].split(' ') if word!='']
            #return  only the next two words from (next_words)
            return next_words[0:2]        
    return [] # return empty list if no phrase matching
        
#read your kp file as list of lines
kp=open("kp.txt").read().split("\n")
#input 1 
text = 'This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.'
print('input ->>',text)
output = search_in_text(kp,text)
print('output ->>',output)
input ->> This is Lee. Thanks for contacting me. I wanted to know exchange policy at Noriaqer hardware services.
output ->> ['exchange', 'policy']
#input 2
text = 'Boss was very angry and said: I wish to know why you are late?'
print('input ->>',text)
output = search_in_text(kp,text)
print('output ->>',output)
input ->> Boss was very angry and said: I wish to know why you are late?
output ->> ['why', 'you']