Firstly, I wish I could think of a better way to describe my problem. I'm trying to take the following example list:
lst<- list(c(2), c(1,3), c(2), c(4), c(6), c(5, 7), c(6,8), c(7))
And turn it into this:
res<- list(c(1,2,3), c(1,2,3), c(1,2,3), c(4), c(5,6,7,8), c(5,6,7,8),c(5,6,7,8), c(5,6,7,8))
OR even this:
also_good_answer<- list(c(1, 2, 3), logical(0), logical(0), 4, c(5, 6, 7, 8), logical(0),
logical(0), logical(0))
I've been using a couple while statements in combination with setdiff to do this, but I'm wondering if there is an eloquent and faster way to do this over a large list?
As always, thank you in advance. -nate
My method:
lst<- list(c(2), c(1,3), c(2), c(4), c(6), c(5, 7), c(6,8), c(7)) # Original List
to_sequence<- 1:length(lst)
res<- lapply(1:length(lst), function(x) {return(NA)}) # Building Result Object
while(length(to_sequence) > 0){
tres<- c()
idx<- to_sequence[1]
next_idxs<- setdiff(lst[[idx]], NA)
tres<- c(tres, next_idxs)
while(length(next_idxs) >=1){
next_idxs<- lapply(seq_along(next_idxs), function(x){
lst[[next_idxs[x]]]
} ) %>% unlist() %>% setdiff(., unlist(tres)) # uses slow setdiff
tres<-c(tres, next_idxs)
}
res[[idx]]<- tres
to_sequence<- setdiff(to_sequence, tres) # Another slow setdiff
cat("Length to_sequence:", base::prettyNum(length(to_sequence),big.mark = ","), "\n")
}
res<- lapply(res, sort)
EDITS
Updated lst:
lst<- list(c(2), c(1,3), c(2,8), c(4), c(6), c(5, 7), c(6,9), c(3),c(7))
Giving Desired Output:
list(c(1, 2, 3, 8), logical(0), logical(0), 4, c(5, 6, 7, 9), logical(0), logical(0), logical(0), logical(0))

The revised question includes commentary that implies that the problem can be interpreted as finding the connected components of an undirected graph described by an input list of vectors of nodes,
lst, such thatlst[[i]]is a vector of the nodes connected to node i.To compute the result, using the input
lst(defined in the question and also shown in the Note at the end), first build the Rcpp_SimplexTree class graph object,st, extract theconnected_componentsproperty, which in this case isc(1, 1, 1, 3, 5, 5, 5, 1, 5)and replace each component oflstwith the unique ordered set of nodes in its connected component.Finally replace duplicate elements in that with
logical(0). The ## line can be omitted if duplicates are ok.Old
In light of the revised explanation in the question this is no longer relevant. Use above instead.
I am not completely sure I understand the processing in the question but the following code does not use
setdiff(although it does useintersect) and there are no explicit loops giving the same result for the example.okis TRUE for each element for which all elements before and after it -- the after includes the element itself - are disjoint.cumsum(ok)gives a grouping vector -c(1, 1, 1, 2, 3, 3, 3, 3)-- and then we create a listLwhich contains the group components themselves and finally repeat it the required number of times. Another thing to try is to see whether Rfast2::Intersect or easy.utils::fastIntersect speeds up the intersect operation.giving
Note