Add rows to the weather data for each day, placing the corresponding date at the top

94 Views Asked by At

I have weather data recorded at 15-minute intervals in R, and I need to format the data such that each day's data is followed by two blank rows with the corresponding date written on the top. Here's a glimpse of the required output:

enter image description here

Desired export file format

enter image description here

I have a dataframe (df) with columns date, temperature, precipitation, and wind_speed and looks like this. How can I achieve this formatting in R?

enter image description here

Here is the reproducible example:

df <- structure(list(date = structure(c(1401104700, 1401105600, 1401106500, 
                                        1401107400, 1401108300, 1401148800, 1401149700, 1401150600, 1401151500, 
                                        1401152400, 1401235200, 1401236100, 1401237000, 1401237900, 1401238800
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), temperature = c(25, 
                                                                   25.2, 25.3, 25.1, 25.4, 18.6, 18.3, 18.2, 18.2, 18.2, 19.7, 19.1, 
                                                                   18.7, 18.5, 18.3), precipitation = c(0, 0, 0, 0, 0, 0, 0, 0, 
                                                                                                        0, 0, 0, 0, 0, 0, 0), wind_speed = c(1.006, 1.006, 0.9054, 0.6036, 
                                                                                                                                             0.4024, 0.1006, 0.2012, 0.503, 0.1006, 0, 0, 0.1006, 0.2012, 
                                                                                                                                             0.1006, 0.2012)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                                                                                                                                                       -15L))
2

There are 2 best solutions below

0
Stibu On BEST ANSWER

I first create a function that can take the data for one day from your data frame and add the desired rows at the beginning of that block. It also converts the dates to the format you show in your question. I use the function paste_comma() to add commas after all the values.

paste_comma <- function(x) paste0(x, ",")

add_header <- function(df) {
  day <- format(df$date[1], "%d_%m_%y")
  header <- tibble(
    date = c("", "", day, "Time,"),
    temperature = c("", "", "", "Temp. (f),"),
    precipitation = c("", "", "", "Precipitation,"),
    wind_speed = c("", "", "", "Wind Speed,")
  )
  df <- df %>%
    mutate(date = format(date, format = "%Y%m%d%H%M PST")) %>%
    mutate(across(everything(), paste_comma))
  bind_rows(header, df)
}

Next, I group the data frame by day and apply the function to each group. I use reframe() because I want to return multiple rows per group.

df_new <- df %>%
  group_by(day = floor_date(date, "day")) %>%
  reframe(add_header(pick(everything()))) %>%
  select(-day)

When you write the tsv file, make sure to omit the headers:

write_tsv(df_new, "data.tsv", col_names = FALSE)

This is how the output looks in Excel:

enter image description here

0
Grzegorz Sapijaszko On

Let's say the data is in excel which looks like:

openxlsx::read.xlsx("data/ala.xlsx",
                    colNames = FALSE)
#>                  X1        X2
#> 1          01_01_23      <NA>
#> 2              Time Temp. (f)
#> 3  202301010030 PST     42.39
#> 4  202301010045 PST     41.84
#> 5  202301010100 PST     41.78
#> 6          01_02_23      <NA>
#> 7              Time Temp. (f)
#> 8  202301020030 PST     43.39
#> 9  202301020045 PST     44.84
#> 10 202301020100 PST     45.78

To get rid off unnecessary rows use subset() with grepl(), then convert the datetime to date time format like:

openxlsx::read.xlsx("data/ala.xlsx",
                    colNames = FALSE) |>
  subset(grepl("PST$", X1)) |>
  dplyr::mutate(X1 = as.POSIXct(substr(X1, 1, 12), format = "%Y%m%d%H%M", tz = "PST")) |>
  dplyr::rename("Date" = X1, "Temperature" = X2)
#>                   Date Temperature
#> 3  2023-01-01 00:30:00       42.39
#> 4  2023-01-01 00:45:00       41.84
#> 5  2023-01-01 01:00:00       41.78
#> 8  2023-01-02 00:30:00       43.39
#> 9  2023-01-02 00:45:00       44.84
#> 10 2023-01-02 01:00:00       45.78

Created on 2024-01-27 with reprex v2.1.0

To save it as csv use write.csv() function