Using for loop to search through string and create data frame

Question

Using for loop to search through string and create data frame

186 Views Asked by scotiaboy At 22 June 2022 at 21:02

I am trying to use openNLP to look through rows of text and classify sentences into thematic buckets. Here is a sample df:

dat <- data.frame(text=c("A fluffy crab discovered off the coast of Western Australia has been named after the ship that carried Charles Darwin around the world. The new species, Lamarckdromia beagle, belongs to the Dromiidae family, commonly known as sponge crabs. Crustaceans in this family fashion and use sea sponges and ascidians – animals including sea squirts – for protection. They trim the creatures using their claws and wear them like hats. ",
                         "The inadmissibility of such actions, which violate the relevant legal and political obligations of the European Union and lead to an escalation of tensions, was pointed out, the ministry said in a statement. Speaking shortly after the meeting, Ederer said he had called on the Russian government to remain calm and resolve this issue diplomatically, the Russian news agency Tass reported."),
                  date=c(as.Date("2020-12-26"),as.Date("2020-12-31")), 
                  id= c("1", "2"))

Ive gotten as splitting the text into sentences, and then searching for the keywords using the following code:


#split sentences search for keywords

all_sentence <- as.String(dat$text)

sent_annotator <- Maxent_Sent_Token_Annotator()
annotation <- annotate(all_sentence, sent_annotator)

split_text <- all_sentence[annotation]

# word list to search for 

word_dat <- data.frame(words=c("animal", "species", "political", "government"),
                  theme=c("nature", "nature", "geopolitics", "geopolitics"))

stem_keyword <- wordStem(word_dat$words, language = "english")


for(kw in stem_keyword) {
  x=grep(kw, split_text)
  print(split_text[x])
  print(stem_keyword[x])
}

However my for loop doesnt print exactly what im looking for.. for example, print(stem_keyword) is giving me the wrong keyword for the wrong sentence. In the end I dont want to print, I want to write the results to a new dataframe with this structure:

final_df <- data.frame(text=c("A fluffy crab discovered off the coast of Western Australia has been named after the ship that carried Charles Darwin around the world.", "The new species, Lamarckdromia beagle, belongs to the Dromiidae family, commonly known as sponge crabs.","Crustaceans in this family fashion and use sea sponges and ascidians – animals including sea squirts – for protection.", "They trim the creatures using their claws and wear them like hats.",
                              "The inadmissibility of such actions, which violate the relevant legal and political obligations of the European Union and lead to an escalation of tensions, was pointed out, the ministry said in a statement.",
                              "Speaking shortly after the meeting, Ederer said he had called on the Russian government to remain calm and resolve this issue diplomatically, the Russian news agency Tass reported."),
                  keyword=c("null", "species", "animal", "null", "political", "government"),
                  theme=c("null", "nature", "nature", "null", "geopolitics", "geopolitics"), 
                  id= c("1", "1", "1", "1", "2", "2"))

Any advice or help getting my for loop to where I need it to be? TIA

EDIT: I would also like for sentences that cannot be classified to appear in the final dataframe with 'null' keywords and themes

Original Q&A

There are 1 best solutions below

**Jonathan** · Answer 1 · 2022-06-23T08:59:02.347000

What you instead want to do is print(kw), but I'm going to provide you with the complete solution for putting your data into a dataframe anyway:

final_df <- data.frame(text = sapply(stem_keyword, 
                                     function(kw) grep(kw, split_text, value = T)), 
                          keyword = word_dat$words, 
                          theme = word_dat$theme)
final_df$id <- sapply(final_df$text, function(text) grep(text, dat$text))

Using for loop to search through string and create data frame

There are 1 best solutions below

Related Questions in R

Related Questions in FOR-LOOP

Related Questions in NLP

Related Questions in OPENNLP

Trending Questions

Popular # Hahtags

Popular Questions