Edit - I broke down the steps using advice I received below and managed to complete this on my own. I will briefly go over the steps in the comments. Thank you!
I am working with a public dataset that has many sources of designation of a community being disadvantaged (DAC). I want to create another column that will indicate "Yes" if ANY of the designations are "yes" (even if other sources indicated "no).
I'm also wondering if one column name is a problem because it contains parenthesis - DAC(CARB)_LIC(CARB) I found somewhere online where it said that backticks ` help in this situation.
As for the first problem, I tried a number of different solutions I found searching online but nothing has worked so far. I can't find many examples of multiple OR conditions (using |) to create a column.
I am trying name the new column DAC. Columns which designate whether a community is disadvantaged are:
- Disadvantaged_Community_OEHHA - "YES" or "NO" in all caps
- Disadvantaged_Community_SB535 - "Yes" or "No" with only the first letter capitalized
- DAC(CARB)_LIC(CARB) - "Low Income and Disadvantaged" or "NA"
- Disadvantaged_Community_CE53 - "YES" or "NO" in all caps
- Disadvantaged_Community_CE54 - "YES" or "NO" in all caps
This is the last thing I tried:
clean_transport_2022 <-transform(clean_transport_2022, DAC = ifelse(Disadvantaged_Community_OEHHA == "YES" | Disadvantaged_Community_SB535 == "Yes", `DAC(CARB)_LIC(CARB)` == "Low Income and Disadvantaged" | Disadvantaged_Community_CE53 = "YES" | Disadvantaged_Community_CE54 = "YES", "Yes", "No"))
Edit - Here is the head() for the dataset -
> head(clean_transport_2022)
# A tibble: 6 × 22
Recipient_Contractor Project_Title Project_Amount Project_Type
<chr> <chr> <dbl> <chr>
1 ABAG Bay Area EV … 14533. Electric Ve…
2 CA EV Alliance Bay Area Cha… 12474. Electric Ve…
3 RTC Fuels, LLC dba Pea… Pearson Fuel… 71053 E85 Fueling…
4 Blink Acquisition Nissan Elect… 6849. Electric Ve…
5 Redwood Coast Energy A… North Coast … 70000 Electric Ve…
6 ABAG Bay Area EV … 7266. Electric Ve…
# ℹ 18 more variables: Fuel_Type <chr>, Supply_Chain_Phase <chr>,
# RDD_D_Phase <chr>, Status <chr>, Project_City <chr>,
# Project_Zip <chr>, County <chr>, Air_District <chr>,
# Electric_Service_Area <chr>,
# Disadvantaged_Community_OEHHA <chr>,
# Disadvantaged_Community_SB535 <chr>, Low_Income_SB535 <chr>,
# Project_Location <chr>, LowIncome_SB1000 <chr>, …
As you want to operate accross columns, this is not easy, since R is designed to work column by column. Here is a solution using base R (without importing libraries). I used
applyto operate rowwise.You are right about the backtiks, but since the solution does not need to use column names, you don't have to use backticks here. If you don't need to operate on all your columns, the easiest option is to create a new
data.framewith only the columns where do you want to check your conditions. Another options would be, using the subset inside theapply.As you did not provide any sample of your data, I created a synthetic dataframe that I believe is similar to yours:
Created on 2024-02-25 with reprex v2.1.0
I would recommend to read this to post future questions.