I have a vector V of consecutive integers of length l , e.g., 1, 2, 3, 4, 5, 6, 7. I want to find all subsets of size k such that the difference between any two numbers in the subset can be no less than m, e.g., 2. Using the example above with l = 7, k = 3, and m = 2, the subsets are
1, 3, 5
1, 3, 6
1, 3, 7
1, 4, 6
1, 4, 7
1, 5, 7
2, 4, 6
2, 4, 7
2, 5, 7
3, 5, 7
One approach is to enumerate all possible subsets of size k and discard any that fail to meet the m constraint, but this procedure explodes even when the number of solutions is small.
My current approach is a brute-force algorithm in which I start from the subset with the lowest possible integer and increase the last entry by 1 until the upper limit is reached, increment the previous entry and reset the last entry to the lowest it can be given the increase in the previous entry. That is, I start with 1, 3, 5, then increase the last digit by one to get 1, 3, 6 and 1, 3, 7, and then since the upper limit is reached I increment the middle by 1 (to 4) and set the last digit to the smallest it can be given that digit (to 6) to get 1, 4, 6, and carry on as such. This ends up being quite slow in R for large l, and I'm wondering if there is a clever vectorized solution that can instantly generate all the combinations, possible by capitalizing on the sequential nature of the entries. Implementing this algorithm in Rcpp speeds this up a bit, but I'm still hoping for a more elegant solution if one is available.
Here are several base R options
Recursion
We can define recursive function like below
and we can obtain
for-loops approachYou can simply run with
forloops like belowand will obtain
ReduceapproachAnother option is using
Reduce, which iteratively increases the dimension of the resultingdata.frameby adding eligible elements fromvas a new column.and you will obtain
Benchmarking
The benchmark includes all existing answers to this question
and we see
For a pressure test with
v <- 1:100,k <- 5andm <- 2, the performance of recursionsf(by @Allan Cameron) andf0(by @ThomasIsCoding) are shown as below