Split a string in consecutive substrings of size n in R in an efficient way

82 Views Asked by At
# Input
n <- 2
"abcd" 
# Output
c("ab", "bc", "cd")

I don't want to use a for loop or sapply

2

There are 2 best solutions below

2
Ronak Shah On BEST ANSWER

You may use substring -

get_n_grams <- function(string, n) {
  len <- nchar(string)
  substring(string, seq_len(len - n + 1), n:len)    
}

get_n_grams("abcd", 2)
#[1] "ab" "bc" "cd"

get_n_grams("abcd", 3)
#[1] "abc" "bcd"
3
ThomasIsCoding On

This embed trick could work but might be not as efficient as the substring approach by @Ronak Shah

> n <- 2

> s <- "abcd"

> apply(embed(utf8ToInt(s), n)[, n:1], 1, intToUtf8)
[1] "ab" "bc" "cd"