Handling last day of February of each year in POSIXct string

33 Views Asked by At

I have a time series from 2014 until now (2023-11-15) in 10min intervall. The data gets aggregated to a 1-hour interval.

data_2014_now_stations_hourly <- data_2014_now_stations %>%
    group_by(weather_station, time = lubridate::floor_date(time, "1 hour")) %>%
    summarise(
      rHumid = mean(rHumid)
    ) %>% 
    ungroup()

nrow(data_2014_now_stations_hourly) 

Furthermore, there is a nc file with 1 hour time steps. At a certain position, the values of every time step get extracted.

data_extracted <- raster::extract(grid, station_relevant) %>%
      t() %>%
      as.data.frame(.) %>%
      mutate(time = rownames(.)) %>%
      `rownames<-`(NULL) %>%
      `colnames<-`(c("grid_value", "time")) %>%
      mutate(time = strptime(time, "X%Y.%m.%d.%H.%M.%S"))

Df1 (data_2014_now_stations_hourly) and Df2 (data_extracted) have the same numbers of rows (86521), even though they are NOT identical (I don't know why). They have NA values at the same positions (even there are no 'NA' visible).

> length(data_extracted$time)
[1] 86521
> length(data_2014_now_stations_hourly$time)
[1] 86521
> identical(data_extraced$time,data_2014_now_stations_hourly$time)
[1] FALSE
> data_2014_now_stations_hourly$time[which(is.na(data_2014_now_stations_hourly$time))]
 [1] "2014-03-30 02:00:00" "2015-03-29 02:00:00" "2016-03-27 02:00:00" "2017-03-26 02:00:00" "2018-03-25 02:00:00"
 [6] "2019-03-31 02:00:00" "2020-03-29 02:00:00" "2021-03-28 02:00:00" "2022-03-27 02:00:00" "2023-03-26 02:00:00"
> data_extracted$time[which(is.na(data_extracted))]
 [1] "2014-03-30 02:00:00" "2015-03-29 02:00:00" "2016-03-27 02:00:00" "2017-03-26 02:00:00" "2018-03-25 02:00:00"
 [6] "2019-03-31 02:00:00" "2020-03-29 02:00:00" "2021-03-28 02:00:00" "2022-03-27 02:00:00" "2023-03-26 02:00:00"

The date/time values where NA is found is always the last day of February of each year at 2:00:00. I know this has to do with the leap years. But this shouldn't be a problem so far

Now I want to join the two df's by the 'time' column. It works, but I receive the following Warning message:

> join <- left_join(extracted_values,data_2014_now_stations_hourly,by="time")
Warning message:
In left_join(extracted_values, data_2014_now_stations_hourly_t,  :
  Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 2115 of `x` matches multiple rows in `y`.
ℹ Row 2115 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship = "many-to-many"` to silence this warning.

Row 2115 is one of those rows with the last day of February. Checking the summary of resulting df:

> summary(join)
   grid_value          time                            rHumid       
 Min.   : 10.61   Min.   :2014-01-01 00:00:00.00   Min.   :  9.833  
 1st Qu.: 59.59   1st Qu.:2016-06-20 06:30:00.00   1st Qu.: 61.667  
 Median : 78.96   Median :2018-12-08 12:00:00.00   Median : 80.000  
 Mean   : 73.59   Mean   :2018-12-08 11:36:28.98   Mean   : 74.674  
 3rd Qu.: 90.24   3rd Qu.:2021-05-27 18:30:00.00   3rd Qu.: 90.667  
 Max.   :100.00   Max.   :2023-11-15 00:00:00.00   Max.   :100.000  
 NA's   :4        NA's   :100                      NA's   :16       

> join[which(is.na(join$time)),]

      grid_value time   rHumid
2115       66.28 <NA> 70.83333
2116       66.28 <NA> 73.83333
2117       66.28 <NA> 92.66667
2118       66.28 <NA> 91.00000
2119       66.28 <NA> 82.50000
2120       66.28 <NA> 81.00000
2121       66.28 <NA> 77.16667
2122       66.28 <NA> 82.00000
2123       66.28 <NA> 75.00000
2124       66.28 <NA> 71.33333
10860      71.04 <NA> 70.83333
10861      71.04 <NA> 73.83333
10862      71.04 <NA> 92.66667
10863      71.04 <NA> 91.00000
10864      71.04 <NA> 82.50000
10865      71.04 <NA> 81.00000
10866      71.04 <NA> 77.16667
10867      71.04 <NA> 82.00000
10868      71.04 <NA> 75.00000
10869      71.04 <NA> 71.33333
19605      92.04 <NA> 70.83333
19606      92.04 <NA> 73.83333
19607      92.04 <NA> 92.66667
19608      92.04 <NA> 91.00000
19609      92.04 <NA> 82.50000
19610      92.04 <NA> 81.00000
19611      92.04 <NA> 77.16667
19612      92.04 <NA> 82.00000
19613      92.04 <NA> 75.00000
19614      92.04 <NA> 71.33333
28350      89.41 <NA> 70.83333
28351      89.41 <NA> 73.83333
28352      89.41 <NA> 92.66667
28353      89.41 <NA> 91.00000
28354      89.41 <NA> 82.50000
28355      89.41 <NA> 81.00000
28356      89.41 <NA> 77.16667
28357      89.41 <NA> 82.00000
28358      89.41 <NA> 75.00000
28359      89.41 <NA> 71.33333
37095      80.10 <NA> 70.83333
37096      80.10 <NA> 73.83333
37097      80.10 <NA> 92.66667
37098      80.10 <NA> 91.00000
37099      80.10 <NA> 82.50000
37100      80.10 <NA> 81.00000
37101      80.10 <NA> 77.16667
37102      80.10 <NA> 82.00000
37103      80.10 <NA> 75.00000
37104      80.10 <NA> 71.33333
46008      77.84 <NA> 70.83333
46009      77.84 <NA> 73.83333
46010      77.84 <NA> 92.66667
46011      77.84 <NA> 91.00000
46012      77.84 <NA> 82.50000
46013      77.84 <NA> 81.00000
46014      77.84 <NA> 77.16667
46015      77.84 <NA> 82.00000
46016      77.84 <NA> 75.00000
46017      77.84 <NA> 71.33333
54753      72.71 <NA> 70.83333
54754      72.71 <NA> 73.83333
54755      72.71 <NA> 92.66667
54756      72.71 <NA> 91.00000
54757      72.71 <NA> 82.50000
54758      72.71 <NA> 81.00000
54759      72.71 <NA> 77.16667
54760      72.71 <NA> 82.00000
54761      72.71 <NA> 75.00000
54762      72.71 <NA> 71.33333
63498      85.87 <NA> 70.83333
63499      85.87 <NA> 73.83333
63500      85.87 <NA> 92.66667
63501      85.87 <NA> 91.00000
63502      85.87 <NA> 82.50000
63503      85.87 <NA> 81.00000
63504      85.87 <NA> 77.16667
63505      85.87 <NA> 82.00000
63506      85.87 <NA> 75.00000
63507      85.87 <NA> 71.33333
72243      87.75 <NA> 70.83333
72244      87.75 <NA> 73.83333
72245      87.75 <NA> 92.66667
72246      87.75 <NA> 91.00000
72247      87.75 <NA> 82.50000
72248      87.75 <NA> 81.00000
72249      87.75 <NA> 77.16667
72250      87.75 <NA> 82.00000
72251      87.75 <NA> 75.00000
72252      87.75 <NA> 71.33333
80988      65.06 <NA> 70.83333
80989      65.06 <NA> 73.83333
80990      65.06 <NA> 92.66667
80991      65.06 <NA> 91.00000
80992      65.06 <NA> 82.50000
80993      65.06 <NA> 81.00000
80994      65.06 <NA> 77.16667
80995      65.06 <NA> 82.00000
80996      65.06 <NA> 75.00000
80997      65.06 <NA> 71.33333

Does this problem has influence on the correctness of the join operation? How can I handle these days in February? Any help appreciated!

0

There are 0 best solutions below