In this type of data:
df
# A tibble: 9 × 6
id Utterance Story Climax Starttime_ms Endtime_ms
<dbl> <chr> <chr> <chr> <dbl> <dbl>
1 4 "yeah" NA NA 20405 23532
2 5 "Come Home " "Come Home " NA 20405 47677 #<--
3 6 ">last time " NA NA 23818 25110
4 7 "two weeks ago? " NA NA 25470 26259
5 8 "and X" NA NA 26623 32103
6 9 "and then last night" NA NA 32688 33797
7 10 "are you sure?" NA NA 34099 37542
8 11 "Come Home climax " NA "Come Home climax " 34099 39895 #<---
9 12 "=she said Y" NA NA 38075 39895
I need to re-arrange the rows in such a way that those rows
- where the interval between
Starttime_msandEndtime_msis larger than in the previous row AND - where the
Startttime_msis the same as in the previous row
are put before that previous row. How can that be done?
The desired output is this:
df
# A tibble: 9 × 6
id Utterance Story Climax Starttime_ms Endtime_ms
<dbl> <chr> <chr> <chr> <dbl> <dbl>
2 5 "Come Home " "Come Home " NA 20405 47677
1 4 "yeah" NA NA 20405 23532
3 6 ">last time " NA NA 23818 25110
4 7 "two weeks ago? " NA NA 25470 26259
5 8 "and X" NA NA 26623 32103
6 9 "and then last night" NA NA 32688 33797
8 11 "Come Home climax " NA "Come Home climax " 34099 39895
7 10 "are you sure?" NA NA 34099 37542
9 12 "=she said Y" NA NA 38075 39895
Data:
df <- structure(list(id = c(4, 5, 6, 7, 8, 9, 10, 11, 12), Utterance = c("yeah",
"Come Home ", ">last time ", "two weeks ago? ",
"and X",
"and then last night", "are you sure?",
"Come Home climax ", "=she said Y"
), Story = c(NA, "Come Home ", NA, NA, NA, NA, NA, NA, NA), Climax = c(NA,
NA, NA, NA, NA, NA, NA, "Come Home climax ", NA), Starttime_ms = c(20405,
20405, 23818, 25470, 26623, 32688, 34099, 34099, 38075), Endtime_ms = c(23532,
47677, 25110, 26259, 32103, 33797, 37542, 39895, 39895)), row.names = c(NA,
-9L), class = c("tbl_df", "tbl", "data.frame"))
This is not a convenient format for replication (see
?dputor thereprexpackage), but something likeshould work (the first argument to
arrangeis the primary sort key; later arguments are used as tiebreakers/sorting within primary categories). You can add aselect(-dt)if you don't want to keep the time-difference variable around.