Create sequence of repeated values, with length based on a vector

147 Views Asked by At

How can I populate column ‘Night’ with a sequence of numbers, each repeating 3 times, and with the sequence restarting based on column ‘Site’? I’ve created a table showing what I want to achieve. This is a simplified version of my issue, I need to be able to use the code on a much larger dataframe.

Image of table

Site_date_time Site Night
1_01012023_2200 1 1
1_01012023_2300 1 1
1_02012023_0000 1 1
1_02012023_2200 1 2
1_02012023_2300 1 2
1_03012023_0000 1 2
2_01012023_2100 2 1
2_01012023_2200 2 1
2_01012023_2300 2 1
2_02012023_2200 2 2
2_02012023_2300 2 2
2_03012023_0000 2 2
2_03012023_2200 2 3
2_03012023_2300 2 3
2_04012023_0000 2 3
#Code to create basic data frame of Site
site <- c(rep(1,times=6), rep(2,times=9))
df <- data.frame(site)

My main issue is the length of the sequence of numbers before restarting the sequence varies (i.e. the number of records for each site varies). I could use the following if the number of rows for a given site was the same.

library("dplyr")
library("data.table")

# Create data frame of the site vector, with the number of observations per site of equal length
site <- c(rep(1,times=6), rep(2,times=6))
df <- data.frame(site)
# Create sequence with repeated numbers 
group_by(df,site) %>% mutate(night = rep(c(1:3), each=3))

But I need a function that allows me to create a sequence with repeated numbers based on the length of my grouped vector, rather than a defined length. I've tried to find a way of combining rep() with seq_along() or rowid(), but have had no luck.

1

There are 1 best solutions below

3
SamR On BEST ANSWER

You can use the length.out argument of rep(). From the docs:

length.out: non-negative integer. The desired length of the output vector. Other inputs will be coerced to a double vector and the first element taken. Ignored if NA or invalid.

The length of your grouped vector can be calculated with dplyr::n().

library(dplyr)

df |>
    mutate(night = rep(seq_len(n()), each = 3, length.out = n()), .by = site)
#    site night
# 1     1     1
# 2     1     1
# 3     1     1
# 4     1     2
# 5     1     2
# 6     1     2
# 7     2     1
# 8     2     1
# 9     2     1
# 10    2     2
# 11    2     2
# 12    2     2
# 13    2     3
# 14    2     3
# 15    2     3

Also, as you included library(data.table) in your question, if df is a data.table you can use the same approach with the data.table syntax, using .N rather than n():

df[, night := rep(seq_len(.N), each = 3, length.out = .N), site]