Binary output if the keyword is present in the sentence based on context - NLP

47 Views Asked by At

Part 1 :

I am trying to build an NLP model such that if the keyword is present in the text the output should be 1 else 0. However, the context of the sentence also needs to be considered to understand that the keyword is actually present in the text.

For example:

Keywords = [ "jaundice","jaundiced","portal hypertension"]
Sentence ="No Jaundice and hypertension detected"

Here the output is expected to be 0 because although the keyword is present the context says the patient doesn't have Jaundice.

My code :

KEYWORDS = list((map(lambda x: x.lower(), KEYWORDS)))

def find_keyword(text, keyword):
    if keyword in text:
        return 1
    else:
        return 0
for keyword in KEYWORDS:
    df1['output'] = df1['CLINICAL_DOCUMENT_TEXT'].apply(lambda t: find_keyword(t, keyword))

df_output=df1[['MCN','CLINICAL_DOCUMENT_TEXT','output']]

df_output.head()

Here the text data is in the column "CLINICAL_DOCUMENT_TEXT" of the dataframe. I want the output as 1 or 0 in the new column output. MCN is the user id.

My output :

MCN     CLINICAL_DOCUMENT_TEXT                           output
2478812 PROGRESS NOTE Service: Hospitalist 6 SU...          0
2478812 PROGRESS NOTE Service: Hospitalist 3 ...            0
2478812 Encounter created for infectious disease scre...    0
2478812 Clinical Note Types: Progress NUTRITION V...        0
2478812 Facts: pt is 59 yo male admitted with covid p...    0

As my code cannot capture the context, it is labeling the text as 1 if the keyword is present even though the sentence means "NO Jaundice".

Part 2 :

I am trying to create a pivot table for each user showing which keywords are present in the text.

My code :

def keywords(row):
  strings = row['CLINICAL_DOCUMENT_TEXT']
  Keywords = [ "jaundice","jaundiced","portal hypertension"] # I have 75 keywords
  keywords = [key for key in keywords if key.upper() in strings.upper()]
  return keywords

df1['keyword'] = df1.apply(keywords, axis=1) 

df_pivot=df1.explode('keyword').pivot_table(index ="MCN" , columns = "keyword").fillna(0).astype(int).reset_index()
    
    df_pivot.columns=df_pivot.columns.droplevel(0)
    df_pivot.head()

My output :

keyword     SBP TIPS    ascites asterixis   black stools    jaundice    jaundiced   lactulose   melena  ... SBP TIPS    ascites asterixis   black stools    jaundice    jaundiced   lactulose   melena  rifaximin
0   1175880 -2147483648 0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0
1   1376151 -2147483648 0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0
2   1784428 -2147483648 0   0   0   0   0   0   0   0   ... 0   0   0   0   0   0   0   0   0   0
3   1932574 0   1199667889  0   0   0   -2147483648 0   0   0   ... 0   0   0   0   0   0   0   0   0   0
4   1977098 -2147483648 0   0   0   0   0   0   -2147483648 -2147483648 ... 0   0   0   0   0   0   0   0   0   0
5 rows × 41 columns

It would be helpful if I get some guidance on how to solve my problem better. Thank you!

0

There are 0 best solutions below