How do I exclude certain strings when using grepl?

124 Views Asked by At

I have a dataframe like this:

df <- data.frame(
  Food = c("Apple", "Banana", "Carrot", "Donut", "Eclair", "Flour"),
  Ingredient = c("salt", "sodium chloride", "salt replacer", "unsalted", "veg salt", "vegetable salt")
)

I want to use grepl to create a variable that shows TRUE when "salt" or "sodium chloride" are present but FALSE for other values "salt replacer", "unsalted", veg salt", "vegetable salt".

The output should be a dataframe that looks like this:

Food Ingredient Salt_Present
Apple salt TRUE
Banana sodium chloride TRUE
Carrot salt replacer FALSE
Donut unsalted FALSE
Eclair veg salt FALSE
Flour vegetable salt FALSE

I am having difficulty writing the regex to achieve this.

How can I write a regex that will return true for Apple and Banana, but false for the other cases in the data?

3

There are 3 best solutions below

0
DuesserBaest On BEST ANSWER

Try this:

library(tidyverse)

df <- data.frame(
  Food = c("Apple", "Banana", "Carrot", "Donut", "Eclair", "Flour"),
  Ingredient = c("salt", "sodium chloride", "salt replacer", "unsalted", "veg salt", "vegetable salt")
)

df %>% mutate(
  Salt_Present = grepl("^salt$|^sodium chloride$",Ingredient)
)

^ and $ ensure that there are no partial matches.

1
Umar On
library(dplyr)
df <- data.frame(
  Food = c("Apple", "Banana", "Carrot", "Donut", "Eclair", "Flour"),
  Ingredient = c("salt", "sodium chloride", "salt replacer", "unsalted", "veg salt", "vegetable salt")
)


print(df)
library(dplyr)

df %>%
  mutate(Salt_Present = case_when(
    Ingredient %in% c("salt", "sodium chloride") ~ TRUE,
    TRUE ~ FALSE
  ))
 print(df)
Food      Ingredient Salt_Present
1  Apple            salt         TRUE
2 Banana sodium chloride         TRUE
3 Carrot   salt replacer        FALSE
4  Donut        unsalted        FALSE
5 Eclair        veg salt        FALSE
6  Flour  vegetable salt        FALSE
2
TarJae On

In this case I suggest to use an if_else statement with the %in% operator:

library(dplyr)

df %>% 
  mutate(Salt_present = if_else(Ingredient %in% c("salt", "sodium chloride"), TRUE, FALSE))

    Food      Ingredient Salt_present
1  Apple            salt         TRUE
2 Banana sodium chloride         TRUE
3 Carrot   salt replacer        FALSE
4  Donut        unsalted        FALSE
5 Eclair        veg salt        FALSE
6  Flour  vegetable salt        FALSE

or much better, thanks to @user2554330 just use the expression Ingredient %in% c("salt", "sodium chloride")