R - how to create DocumentTermMatrix for Korean words

40 Views Asked by Brian At 30 May 2022 at 08:45

I hope those text mining gurus, that are also Non-Koreans can help me with my very specific question.

I'm currently trying to create a Document Term Matrxi (DTM) on a free text variable that contains mixed English words and Korean words.

First of all, I have used cld3::detect_language function to remove those obs with non-Koreans from the data.

Second of all, I have used KoNLP package to extract nouns only from the filtered data (Korean text only)

Third of all, I know that by using tm package, I can create DTM rather easily.

The issue is that when I use tm pakcage to create DTM, it doesn't allow only nouns to be recognized. This is not an issue if you're dealing with English words, but Korean words is a different story. For example, if I use KoNLP to extract nouns only, I can extract "훌륭" from "훌륭히", "훌륭한", "훌륭하게", "훌륭하고", "훌륭했던", etc.. and tm package doesn't recognize this as treats all these terms separately, when creating a DTM.

Is there any way I can create a DTM based on nouns that were extracted from KoNLP package?

I've noticed that if you're non-Korean, you may have a difficulty understanding my question. I'm hoping someone can give me a direction here.

Much appreciated in advance.

Original Q&A

R - how to create DocumentTermMatrix for Korean words

There are 0 best solutions below

Related Questions in R

Related Questions in TEXT-MINING

Related Questions in KOREAN-NLP

Trending Questions

Popular # Hahtags

Popular Questions