How can I extract sentences with certain text in a spreadsheet?

63 Views Asked by Pinky Rosie At 11 May 2023 at 04:59

I got a spreadsheet which looks like this. I will like to keep the file column, but extract only the sentences with the word "India". Is there a way to do that? Prefer to use KNIME or R, but happy with any solution.

Only the sentences with "India" is extracted, but the file column is kept.

Original Q&A

There are 2 best solutions below

L Tyrone On 11 May 2023 at 05:41

This can be achieved using the dplyr and str_detect() from the stringr package. Note that "India | india" in the following code will capture both "India" and the grammatically incorrect "india" in case it exists:

library(dplyr)
library(stringr)

# Some example data
df <- data.frame(File = c(1356, 1548, 1600, 1601),
                 Text = c("Digital India is an initiative by the Government of India to ensure that Government services are made available to citizens electronically by improving online infrastructure and by i",
                          "The textile industry in India traditionally, after agriculture, is the only industry that has generated huge employment for both skilled and unskilled labour. The textile industry conti",
                          "Some other text",
                          "This string has india without a capital I."))

df <- df %>%
  filter(str_detect(Text, "India | india"))

df
#   File   Text
# 1 1356   Digital India is an initiative by the Government of India to ensure that Government services are made available to citizens electronically by improving online infrastructure and by i
# 2 1548   The textile industry in India traditionally, after agriculture, is the only industry that has generated huge employment for both skilled and unskilled labour. The textile industry conti
# 3 1601   This string has india without a capital I.

akrun On 11 May 2023 at 06:31

We can use base R with grepl

subset(df, grepl("India", Text, ignore.case = TRUE))

How can I extract sentences with certain text in a spreadsheet?

There are 2 best solutions below

Related Questions in R

Related Questions in TEXT-EXTRACTION

Related Questions in TAGGING

Related Questions in SENTENCE

Related Questions in KNIME

Trending Questions

Popular # Hahtags

Popular Questions