The data was imported from a CSV file, with the dates originally in characters. I have mutated them separately as I want to do some further analysis on them.
The only other question on the topic I can find, the answer is complicated by needing to allow for pre-1970 dates, but my data is from the last few years, so that isn't a consideration.
EARLIEST_DATE <- c(06/07/2021, 15/11/2019, 01/10/2019, 24/09/2019, 10/11/2020, 21/09/2021, 21/07/2020, 18/10/2022, 05/10/2021, 22/10/2021)
LATEST_DATE <- c(28/09/2021, 20/12/2019, 31/03/2020, 18/09/2020, 15/06/2021, 28/06/2022, 04/09/2020, 28/02/2023, 02/11/2021, 08/07/2022)
library(dplyr)
df <- data.frame(EARLIEST_DATE, LATEST_DATE)
df %>%
mutate(EARLIEST_DATE = as.Date(EARLIEST_DATE,'%d/%m/%Y'),
LATEST_DATE = as.Date(LATEST_DATE,'%d/%m/%Y'))
df %>%
summarise(weeks = ceiling(as.numeric(difftime(EARLIEST_DATE, LATEST_DATE, units = "weeks"))))
I get the following results:
1 -1157
2 -265
3 78
4 209
5 -239
6 -352
7 774
8 970
9 152
10 744
I can't work out why it's giving such wildly incorrect values.
They should all be positive, and none of the intervals is more than a year, so it's not even as if it's presenting the difference in days. The ceiling clause is so that the value returned for weeks is at least 1.
I can calculate the differences fine in Excel, but the file is so large it keeps crashing whenever I try to do anything, hence using R.
Try parsing them as strings before the calculation. Using lubridate's
dmy()to parse day-month-year formatted strings, I seem to get a more sensible output.Created on 2023-04-12 with reprex v2.0.2