Create new column with string containing column names based on condition in R

Question

Create new column with string containing column names based on condition in R

614 Views Asked by EmmaV97 At 22 December 2022 at 15:54

I need to clean up a large datafile (±1300 columns) and store it into a logical data frame. I know it should be possible, but I have no idea how.

So my data only contains 'yes', 'no' or 'n/a' values, but it's the columns they are in that have all the needed information. It's about who you spent time with during your day. Each row is 1 participant. Here is an example:

Monday.06.00.adult.1	Monday.06.00.adult.2	Monday.06.00.adult.3	Monday.07.00.adult.1	Monday.07.00.adult.2	Monday.07.00.adult.3	Monday.08.00.adult.1	Monday.08.00.adult.2	Monday.08.00.adult.3
yes	yes	n/a	yes	yes	n/a	no	yes	n/a
yes	no	no	yes	no	no	yes	yes	yes

Here you can deduct that participant 1 spends his Monday morning at 6 o'clock with adult 1 and adult 2, and there is no adult 3. Adult 1 leaves at 8, as then his presence is changed to 'no'. For participant 2 we see that his mornings start with only adult 1, and adult 2 and 3 join at 8. I have columns for every hour of every day for a whole week for spending it with 5 adults and 5 children... How I would want my data to look at the end:

Participant	Day	Hour	SpentWith
1	Monday	06	Adult1, Adult2
1	Monday	07	Adult1, Adult2
1	Monday	08	Adult2
2	Monday	06	Adult1
2	Monday	07	Adult1
2	Monday	08	Adult1, Adult2, Adult3

Is there any way that I can select (per row) all the 'yes' answers, and then combine their column names if they overlap on the day and hour? (So for participant 1, the 'yes' answers would be in column Monday.06.00.adult.1 and Monday.06.00.adult.2, and I need to be able to extract and then combine 'Adult1' and 'Adult2' and store it into a separate column.

I've tried to collect all column names from the 'yes' answers with

names(data)[which(data == "yes", arr.ind=T)[, "col"]]

But this provides me with a list where I cannot see which value (column name) belongs to which participant, which makes it impossible to concatenate the right ones.

"monday.06.00.adult.1"      "monday.06.00.adult.1"      "monday.06.00.adult.2"      "monday.06.00.adult.2"      "monday.07.00.adult.1"      "monday.07.00.adult.2"      "monday.07.00.adult.2".     "monday.07.00.child.1"      "monday.07.00.child.1"

(This is a random example)

I've thought about using dplyr (but I have little experience with it).

data <- data %>% mutate(SpentWith = case_when(
if monday.07.00.adult.1 == "yes", ~ "Adult1",
if monday.07.00.adult.2 == "yes", ~ "Adult2",
if monday.07.00.adult.3 == "yes", ~ "Adult3"
))

Where it would be ideal if it would add on the 'Adult1', 'Adult2', 'Adult3' strings in one character variable if it meets that condition.

Any help in any step of this process would be very helpful!

Original Q&A

There are 1 best solutions below

**zephryl** · Answer 1 · 2022-12-22T16:01:50.840000

Using dplyr and tidyr::pivot_longer():

library(dplyr) 
library(tidyr)

data %>% 
  mutate(Participant = row_number()) %>%   # add participant IDs
  pivot_longer(                            # pivot rows to columns:
    !Participant,                            ## use all columns except Participant
    names_sep = "\\.",                       ## break up column names by "."
    names_to = c("Day", "Hour", "Minute", "Adult", "SpentWith")  ## names for new cols
  ) %>% 
  filter(value == "yes") %>%              # keep "yes" values only
  group_by(Participant, Day, Hour) %>%    # within each Participant / Day / Hour:
  summarize(                              # combine SpentWith values into one string
    SpentWith = paste0("Adult", SpentWith, collapse = ", "),
    .groups = "drop"
  )

# A tibble: 6 × 4
  Participant Day    Hour  SpentWith             
        <int> <chr>  <chr> <chr>                 
1           1 Monday 06    Adult1, Adult2        
2           1 Monday 07    Adult1, Adult2        
3           1 Monday 08    Adult2                
4           2 Monday 06    Adult1                
5           2 Monday 07    Adult1                
6           2 Monday 08    Adult1, Adult2, Adult3

Create new column with string containing column names based on condition in R

There are 1 best solutions below

Related Questions in R

Related Questions in DATAFRAME

Related Questions in DPLYR

Related Questions in DATA-CLEANING

Related Questions in COLUMNSORTING

Trending Questions

Popular # Hahtags

Popular Questions