The time interval in hours using difftime and if else statements

47 Views Asked by At

R script. I want to do the following.

The dataset is grouped by pid, med and date1.

If pid or med changes then the hours_output == 255, otherwise the hours_output == the time interval in hours.

If the date1 alters, then the last datetime on that day == 255, otherwise the hours_output == the time interval in hours.

This is the dummy dataset.

df <- data.frame(
  pid = c(rep(1, 3), rep(2, 3), rep(3, 3), rep(4, 3)),
  med = c(rep("drugA", 4), rep("drugB", 4), rep("drugC", 4)),
  date1 = c("2023-02-01 09:00:00", "2023-02-01 12:00:00", "2023-02-01 14:00:00",
            "2023-02-02 10:00:00", "2023-02-02 18:00:00", "2023-02-03 11:00:00",
            "2023-02-04 09:00:00", "2023-02-04 12:00:00", "2023-02-05 10:00:00",
            "2023-02-06 08:00:00", "2023-02-06 12:00:00", "2023-02-06 14:00:00")
)

Desired output.

pid    med       date1               pid_change med_change   date1_change  hours_output

1     drugA     2023-02-01 09:00:00          0          0            0          3
1     drugA     2023-02-01 12:00:00          0          0            1          2
1     drugA     2023-02-01 14:00:00          0          0            1        255
2     drugA     2023-02-02 10:00:00          1          0            1        255
2     drugB     2023-02-02 18:00:00          0          1            1        255
2     drugB     2023-02-03 11:00:00          0          0            1        255
3     drugB     2023-02-04 09:00:00          1          0            1        255
3     drugB     2023-02-04 12:00:00          0          0            1        255
3     drugC     2023-02-05 10:00:00          0          1            1        255
4     drugC     2023-02-06 08:00:00          1          0            1        255
4     drugC     2023-02-06 12:00:00          0          0            1        255
4     drugC     2023-02-06 14:00:00          0          0            1        255

This is the sample script tried.

# Convert date1 to a POSIXct format
df$date1 <- as.POSIXct(df$date1)

# Add a column to track changes in pid, med, and date1
df <- df %>% mutate(pid_change = ifelse(pid != lag(pid, default = first(pid)), 1, 0), 
med_change = ifelse(med != lag(med, default = first(med)), 1, 0), 
date1_change = ifelse(date1 != lag(date1, default = first(date1)), 1, 0))

# Calculate the hours_output column based on changes in pid, med, and date1
df <- df %>%
  group_by(pid, med) %>%
  mutate(
    hours_output = ifelse(pid_change | med_change | date1_change, 255, as.numeric(c(diff(date1), 24), units = "hours"))
  ) %>%
  ungroup() %>%
  mutate(
    hours_output = ifelse(is.na(hours_output), -999, hours_output),
    hours_output = abs(hours_output)
  )

df

returns the following.

pid    med       date1               pid_change med_change   date1_change  hours_output

1     drugA     2023-02-01 09:00:00          0          0            0          3
1     drugA     2023-02-01 12:00:00          0          0            1        255
1     drugA     2023-02-01 14:00:00          0          0            1        255
2     drugA     2023-02-02 10:00:00          1          0            1        255
2     drugB     2023-02-02 18:00:00          0          1            1        255
2     drugB     2023-02-03 11:00:00          0          0            1        255
3     drugB     2023-02-04 09:00:00          1          0            1        255
3     drugB     2023-02-04 12:00:00          0          0            1        255
3     drugC     2023-02-05 10:00:00          0          1            1        255
4     drugC     2023-02-06 08:00:00          1          0            1        255
4     drugC     2023-02-06 12:00:00          0          0            1        255
4     drugC     2023-02-06 14:00:00          0          0            1        255
0

There are 0 best solutions below