Recommended way to extract "the representative" (not necessarily most frequent) 4-grams in a corpus? TF-IDF or

23 Views Asked by Vahid At 01 September 2023 at 13:48

I have a corpus of 500 research articles and I want to extract the top 4-grams NOT simply based on the highest frequency but relevance to the research article genre in general (the 4-grams characteristic of this genre).

TF-IDF was recommended, and using Scikit-learn, I get a list of 4-grams based on TF-IDF score.

Question: High TF-IDF score means that the 4-gram has appeared in fewer articles. How are those 4-grams then representative of the research article genre if they have appeared in fewer articles? Is there any other approach you recommend?

Thanks.

Original Q&A

Recommended way to extract "the representative" (not necessarily most frequent) 4-grams in a corpus? TF-IDF or

There are 0 best solutions below

Related Questions in NLP

Related Questions in TEXT-MINING

Related Questions in TF-IDF

Related Questions in CORPUS

Trending Questions

Popular # Hahtags

Popular Questions