How do I convert this print statement into a data frame? Python NLP LSA topics

672 Views Asked by At

I need to add these LSA topics to each corresponding topic in my data frame. How can I get this print statement output in a data frame?

--> I am trying to get a data frame with the topic numbers and their corresponding keywords in a different column.

# most important words for each topic
vocab = vect.get_feature_names()

for i, comp in enumerate(lsa_model.components_):
    vocab_comp = zip(vocab, comp)
    sorted_words = sorted(vocab_comp, key= lambda x:x[1], reverse=True)[:3]
    print("Topic "+str(i)+": ")
    for t in sorted_words:
        print(t[0],end=" ")
    print("\n")

topic 1: xxx yyy zzz . . . Topic 8: fddd dddd dsdsd

Topic 9: akah ahkha ahkha

2

There are 2 best solutions below

2
Adib Nur On

Assuming you have a data frame named df where the LSA topics are stored as integers under the column name df['topics]

You could do the following:

topic_map = {}
for i, comp in enumerate(lsa_model.components_):
    vocab_comp = zip(vocab, comp)
    sorted_words = sorted(vocab_comp, key= lambda x:x[1], reverse=True)[:3]
    topic_map[i] = ' '.join(sorted_words)

df['topics'] = df['topics'].apply(lambda x: topic_map[x])
2
Graham Streich On

Add the following lines to the top of your work environment:

import pandas as pd

headings=['Name_of_Variable1','Name_of_Variable2'] # add more as needed
df = pd.DataFrame([], columns=headings) 

And, add the following line, or something similar, within your function after the for t in sorted_words:

df = df.append(t,ignore_index=True)

To look like:

for t in sorted_words:
    print(t[0],end=" ")
    df = df.append(t,ignore_index=True)
    print("\n")

Please use the following material to properly use the append function: https://www.geeksforgeeks.org/python-pandas-dataframe-append/