I have a long df that looks like this:
df <- data.frame(id=as.integer(c(123,123,123,124,124,124,125,125,126,126,126)),
date=as.Date(c("2014-03-12", "2015-05-02", "2015-09-16", "2015-10-24", "2016-12-11", "2017-10-17", "2017-08-06", "2018-01-29", "2015-09-16", "2015-11-12", "2015-12-03")),
event=as.character(c("A", "C", "E", "A", "B", "D", "A", "E", "A", "B", "E")),
order=as.integer(c(1,2,3,1,2,3,1,2,1,2,3)),
diff=as.integer(c(0,416,553,0,414,724,0,176,0,57,78)))
df
id date event order diff
1 123 2014-03-12 A 1 0
2 123 2015-05-02 C 2 416
3 123 2015-09-16 E 3 553
4 124 2015-10-24 A 1 0
5 124 2016-12-11 B 2 414
6 124 2017-10-17 D 3 724
7 125 2017-08-06 A 1 0
8 125 2018-01-29 E 2 176
9 126 2015-09-16 A 1 0
10 126 2015-11-12 B 2 57
11 126 2015-12-03 E 3 78
Each id will always have an initial event A and a final event, either D or E (mutually exclusive). Events B and C might or might not occur. diff is the difference in days between the date of each event and the date of the initial event A per id.
I want to obtain a wide df where each event will be a column (e.g. A_status) where 0= absent; 1= present. Similarly, each correspondent diff will be a column (e.g. A_time). But when event B or C is absent (e.g B= 0 or C= 0), I want their time to be filled with either D_time or E_time, whichever is present.
I need to create two columns based on the values of D and E:
- a
columnD.E_status, where0=D;1=E, and - a
columnD.E_timethat will receive whichevertimeis recorded (ofDorE).
This is the desired output:
id A_status A_time B_status B_time C_status C_time D.E_status D.E_time
123 1 0 0 553 1 416 1 553
124 1 0 1 414 0 724 0 724
125 1 0 0 176 0 176 1 176
126 1 0 1 57 0 78 1 78
I really appreciate your help on this given my very basic R skills.
I'll demonstrate a
dplyr/tidyrsolution. Up front, I think your last value forB_timeshould be57not78.