I have a time series from 2014 until now (2023-11-15) in 10min intervall. The data gets aggregated to a 1-hour interval.
data_2014_now_stations_hourly <- data_2014_now_stations %>%
group_by(weather_station, time = lubridate::floor_date(time, "1 hour")) %>%
summarise(
rHumid = mean(rHumid)
) %>%
ungroup()
nrow(data_2014_now_stations_hourly)
Furthermore, there is a nc file with 1 hour time steps. At a certain position, the values of every time step get extracted.
data_extracted <- raster::extract(grid, station_relevant) %>%
t() %>%
as.data.frame(.) %>%
mutate(time = rownames(.)) %>%
`rownames<-`(NULL) %>%
`colnames<-`(c("grid_value", "time")) %>%
mutate(time = strptime(time, "X%Y.%m.%d.%H.%M.%S"))
Df1 (data_2014_now_stations_hourly) and Df2 (data_extracted) have the same numbers of rows (86521), even though they are NOT identical (I don't know why). They have NA values at the same positions (even there are no 'NA' visible).
> length(data_extracted$time)
[1] 86521
> length(data_2014_now_stations_hourly$time)
[1] 86521
> identical(data_extraced$time,data_2014_now_stations_hourly$time)
[1] FALSE
> data_2014_now_stations_hourly$time[which(is.na(data_2014_now_stations_hourly$time))]
[1] "2014-03-30 02:00:00" "2015-03-29 02:00:00" "2016-03-27 02:00:00" "2017-03-26 02:00:00" "2018-03-25 02:00:00"
[6] "2019-03-31 02:00:00" "2020-03-29 02:00:00" "2021-03-28 02:00:00" "2022-03-27 02:00:00" "2023-03-26 02:00:00"
> data_extracted$time[which(is.na(data_extracted))]
[1] "2014-03-30 02:00:00" "2015-03-29 02:00:00" "2016-03-27 02:00:00" "2017-03-26 02:00:00" "2018-03-25 02:00:00"
[6] "2019-03-31 02:00:00" "2020-03-29 02:00:00" "2021-03-28 02:00:00" "2022-03-27 02:00:00" "2023-03-26 02:00:00"
The date/time values where NA is found is always the last day of February of each year at 2:00:00. I know this has to do with the leap years. But this shouldn't be a problem so far
Now I want to join the two df's by the 'time' column. It works, but I receive the following Warning message:
> join <- left_join(extracted_values,data_2014_now_stations_hourly,by="time")
Warning message:
In left_join(extracted_values, data_2014_now_stations_hourly_t, :
Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 2115 of `x` matches multiple rows in `y`.
ℹ Row 2115 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship = "many-to-many"` to silence this warning.
Row 2115 is one of those rows with the last day of February. Checking the summary of resulting df:
> summary(join)
grid_value time rHumid
Min. : 10.61 Min. :2014-01-01 00:00:00.00 Min. : 9.833
1st Qu.: 59.59 1st Qu.:2016-06-20 06:30:00.00 1st Qu.: 61.667
Median : 78.96 Median :2018-12-08 12:00:00.00 Median : 80.000
Mean : 73.59 Mean :2018-12-08 11:36:28.98 Mean : 74.674
3rd Qu.: 90.24 3rd Qu.:2021-05-27 18:30:00.00 3rd Qu.: 90.667
Max. :100.00 Max. :2023-11-15 00:00:00.00 Max. :100.000
NA's :4 NA's :100 NA's :16
> join[which(is.na(join$time)),]
grid_value time rHumid
2115 66.28 <NA> 70.83333
2116 66.28 <NA> 73.83333
2117 66.28 <NA> 92.66667
2118 66.28 <NA> 91.00000
2119 66.28 <NA> 82.50000
2120 66.28 <NA> 81.00000
2121 66.28 <NA> 77.16667
2122 66.28 <NA> 82.00000
2123 66.28 <NA> 75.00000
2124 66.28 <NA> 71.33333
10860 71.04 <NA> 70.83333
10861 71.04 <NA> 73.83333
10862 71.04 <NA> 92.66667
10863 71.04 <NA> 91.00000
10864 71.04 <NA> 82.50000
10865 71.04 <NA> 81.00000
10866 71.04 <NA> 77.16667
10867 71.04 <NA> 82.00000
10868 71.04 <NA> 75.00000
10869 71.04 <NA> 71.33333
19605 92.04 <NA> 70.83333
19606 92.04 <NA> 73.83333
19607 92.04 <NA> 92.66667
19608 92.04 <NA> 91.00000
19609 92.04 <NA> 82.50000
19610 92.04 <NA> 81.00000
19611 92.04 <NA> 77.16667
19612 92.04 <NA> 82.00000
19613 92.04 <NA> 75.00000
19614 92.04 <NA> 71.33333
28350 89.41 <NA> 70.83333
28351 89.41 <NA> 73.83333
28352 89.41 <NA> 92.66667
28353 89.41 <NA> 91.00000
28354 89.41 <NA> 82.50000
28355 89.41 <NA> 81.00000
28356 89.41 <NA> 77.16667
28357 89.41 <NA> 82.00000
28358 89.41 <NA> 75.00000
28359 89.41 <NA> 71.33333
37095 80.10 <NA> 70.83333
37096 80.10 <NA> 73.83333
37097 80.10 <NA> 92.66667
37098 80.10 <NA> 91.00000
37099 80.10 <NA> 82.50000
37100 80.10 <NA> 81.00000
37101 80.10 <NA> 77.16667
37102 80.10 <NA> 82.00000
37103 80.10 <NA> 75.00000
37104 80.10 <NA> 71.33333
46008 77.84 <NA> 70.83333
46009 77.84 <NA> 73.83333
46010 77.84 <NA> 92.66667
46011 77.84 <NA> 91.00000
46012 77.84 <NA> 82.50000
46013 77.84 <NA> 81.00000
46014 77.84 <NA> 77.16667
46015 77.84 <NA> 82.00000
46016 77.84 <NA> 75.00000
46017 77.84 <NA> 71.33333
54753 72.71 <NA> 70.83333
54754 72.71 <NA> 73.83333
54755 72.71 <NA> 92.66667
54756 72.71 <NA> 91.00000
54757 72.71 <NA> 82.50000
54758 72.71 <NA> 81.00000
54759 72.71 <NA> 77.16667
54760 72.71 <NA> 82.00000
54761 72.71 <NA> 75.00000
54762 72.71 <NA> 71.33333
63498 85.87 <NA> 70.83333
63499 85.87 <NA> 73.83333
63500 85.87 <NA> 92.66667
63501 85.87 <NA> 91.00000
63502 85.87 <NA> 82.50000
63503 85.87 <NA> 81.00000
63504 85.87 <NA> 77.16667
63505 85.87 <NA> 82.00000
63506 85.87 <NA> 75.00000
63507 85.87 <NA> 71.33333
72243 87.75 <NA> 70.83333
72244 87.75 <NA> 73.83333
72245 87.75 <NA> 92.66667
72246 87.75 <NA> 91.00000
72247 87.75 <NA> 82.50000
72248 87.75 <NA> 81.00000
72249 87.75 <NA> 77.16667
72250 87.75 <NA> 82.00000
72251 87.75 <NA> 75.00000
72252 87.75 <NA> 71.33333
80988 65.06 <NA> 70.83333
80989 65.06 <NA> 73.83333
80990 65.06 <NA> 92.66667
80991 65.06 <NA> 91.00000
80992 65.06 <NA> 82.50000
80993 65.06 <NA> 81.00000
80994 65.06 <NA> 77.16667
80995 65.06 <NA> 82.00000
80996 65.06 <NA> 75.00000
80997 65.06 <NA> 71.33333
Does this problem has influence on the correctness of the join operation? How can I handle these days in February? Any help appreciated!