Subset a nested list by a nested list of indices

Question

Subset a nested list by a nested list of indices

98 Views Asked by Alkaligrass At 27 March 2024 at 16:47

I'm trying to subset this issue using R base. I have a list of names that I want to subset by severam nested lists of indices. The list of names comes in as very compact and straightforward, the list of indices is nested an the indices may have leaps, ie, some numbers can be non-consecutive; each sublist starts the count from 1.

The index numbers may go until ~80, but they're always less than the vector of names, ie. it's always possible to use them to subset the element within the List OfNames.

Data are like this for the various ListOfIndices_n:

ListOfIndices_1<-list(
                       Chr1 = list(c(1,2,3,4,5,6), c(7,8,9,10), c(11,12)),
                       Chr2 = list(c(4,5), c(7,8,9,10)),
                       Chr3 = list(c(1,2,3), c(5,6,7,8,9), c(10,11), c(14,15,16))
                      )


ListOfIndices_2<-list(
                       Chr1 = list(c(1,2),c(5,6,7),c(10,11,12)),
                       Chr2 = list(c(1,2,3), c(4,5), c(9,10), c(13,14,15,16)),
                       Chr3 = list(c(1,2), c(5,6,7), c(10,11), c(15,16)),
                       Chr4 = list(c(2,3,4), c(5,6), c(8,9,10,11), c(13,14,15,16))
                      )

and like this for ListOfNames:

ListOfNames<-list(
                  Chr1 = c("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","A1","B2","C3","D4","E1","F2","G3","H4","I1"),
                  Chr2 = c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","a1","b2","c3","d4","e5","f6","g7","h8","i9","j10","k11","l12","m13","n14","o15","p16","q17","r18","s19","t20","u1","v2","w3","x4","y5","z6"),
                  Chr3 = c("aA","bB","cC","dD","eE","fF","gG","hH","iI","jJ","kK","lL","mM","nN","oO","pP","qQ","rR","sS","tT","uU","vV","wW","xX","yY","zZ")
                  Chr4 = c("aA1","bB2","cC3","dD4","eE5","fF6","gG7","hH8","iI9","jJ10","kK11","lL12","mM13","nN14","oO15","pP16","qQ17","rR18","sS19","tT20","uU21","vV22","wW23","xX24","yY25","zZ26")
)

What I want to get are several ListOfIndicisedNames, each the result of ListOfNames subsetted by a different ListOfIndices_n:

ListOfIndicisedNames_1<-list(
                           Chr1 = list(c("A","B","C","D","E","F"),c("G","H","I","J"),c("K","L")),
                           Chr2 = list(c("d","e"), c("g","h","i","j")),
                           Chr3 = list(c("aA","bB","cC"), c("eE","fF","gG","hH","iI"), c("jJ","kK"),c("nN","oO","pP"))
)

ListOfIndicisedNames_2<-list(
                           Chr1 = list(c("A","B") c("E","F","G") c("J","K","L")),
                           Chr2 = list(c("a","b","c"), c("d","e"), c("i","h"), c("m","n","o","p"),
                           Chr3 = list(c("aA","bB"), c("eE","fF","gG"), c("iI","jJ"), c("oO","pP"))
                           Chr4 = list(c("bB2","cC3","dD4"), c("eE5","fF6"), c("hH8","iI9","jJ10","kK11"), c("mM13","nN14","oO15","pP16")
)

I've tried several flavours of lapply() and sapply(), though I didn't manage to get do.call() - I still find it very magical whenever I manage to make it work.

I think that another way could be gettig the result by a for loop with i and j for lists and sublists respectively, but every attempt has made R angry.

Could someone please help me in this maze?

Original Q&A

There are 4 best solutions below

SamR On 27 March 2024 at 16:55

Avoiding nested iteration

An approach to do this with out nesting lapply() or map() statements is to write a function which to subset a vector, v, by a list of vectors, l, without iteration. We can split() v by the groups defined by the indices in l, and then unlist() to extract the elements according to those indices. Finally, we can use rep() to repeat the indices as needed to ensure proper grouping. unname() simply strips the unnecessary names that are assigned by split().

subset_vector_by_list <- function(v, l) {
    split(
        v[unlist(l)],
        rep(seq_along(len <- lengths(l)), len),
    ) |> unname()
}

Then you can write a function to iterate over your list of indices:

subset_list_by_list <- function(names_l, index_l) {
    lapply(names(index_l), \(nm, nm_list = names_l, l = index_l)
    subset_vector_by_list(nm_list[[nm]], l[[nm]]))
}

Then again you can just apply this to your relevant lists:

subset_list_by_list(ListOfNames, ListOfIndices_1)
List of 3
 $ :List of 3
  ..$ : chr [1:6] "A" "B" "C" "D" ...
  ..$ : chr [1:4] "G" "H" "I" "J"
  ..$ : chr [1:2] "K" "L"
 $ :List of 2
  ..$ : chr [1:2] "d" "e"
  ..$ : chr [1:4] "g" "h" "i" "j"
 $ :List of 4
  ..$ : chr [1:3] "aA" "bB" "cC"
  ..$ : chr [1:5] "eE" "fF" "gG" "hH" ...
  ..$ : chr [1:2] "jJ" "kK"
  ..$ : chr [1:3] "nN" "oO" "pP"

And

subset_list_by_list(ListOfNames, ListOfIndices_2)
[[1]]
[[1]][[1]]
[1] "A" "B"

[[1]][[2]]
[1] "E" "F" "G"

[[1]][[3]]
[1] "J" "K" "L"


[[2]]
[[2]][[1]]
[1] "a" "b" "c"

[[2]][[2]]
[1] "d" "e"

[[2]][[3]]
[1] "i" "j"

[[2]][[4]]
[1] "m" "n" "o" "p"


[[3]]
[[3]][[1]]
[1] "aA" "bB"

[[3]][[2]]
[1] "eE" "fF" "gG"

[[3]][[3]]
[1] "jJ" "kK"

[[3]][[4]]
[1] "oO" "pP"


[[4]]
[[4]][[1]]
[1] "bB2" "cC3" "dD4"

[[4]][[2]]
[1] "eE5" "fF6"

[[4]][[3]]
[1] "hH8"  "iI9"  "jJ10" "kK11"

[[4]][[4]]
[1] "mM13" "nN14" "oO15" "pP16"

Carl On 27 March 2024 at 18:24

A tidyverse option:

library(tidyverse)

ListOfIndices<-list(
  Chr1 = list(c(1,2,3,4,5,6), c(7,8,9,10), c(11,12)),
  Chr2 = list(c(4,5), c(7,8,9,10)),
  Chr3 = list(c(1,2,3), c(5,6,7,8,9), c(10,11), c(14,15,16))
)

ListOfNames<-list(
  Chr1 = c("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","A1","B2","C3","D4","E1","F2","G3","H4","I1"),
  Chr2 = c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","a1","b2","c3","d4","e5","f6","g7","h8","i9","j10","k11","l12","m13","n14","o15","p16","q17","r18","s19","t20","u1","v2","w3","x4","y5","z6"),
  Chr3 = c("aA","bB","cC","dD","eE","fF","gG","hH","iI","jJ","kK","lL","mM","nN","oO","pP","qQ","rR","sS","tT","uU","vV","wW","xX","yY","zZ"))

new <- map2(ListOfIndices, ListOfNames, \(x, y) map(x, \(x) (y)[x]))
str(new)
#> List of 3
#>  $ Chr1:List of 3
#>   ..$ : chr [1:6] "A" "B" "C" "D" ...
#>   ..$ : chr [1:4] "G" "H" "I" "J"
#>   ..$ : chr [1:2] "K" "L"
#>  $ Chr2:List of 2
#>   ..$ : chr [1:2] "d" "e"
#>   ..$ : chr [1:4] "g" "h" "i" "j"
#>  $ Chr3:List of 4
#>   ..$ : chr [1:3] "aA" "bB" "cC"
#>   ..$ : chr [1:5] "eE" "fF" "gG" "hH" ...
#>   ..$ : chr [1:2] "jJ" "kK"
#>   ..$ : chr [1:3] "nN" "oO" "pP"

^{Created on 2024-03-27 with reprex v2.1.0}

Alper Göktuğ Tamer On 27 March 2024 at 20:40

Here it is, it is a little bit complicated but it works

ListOfIndices<-list(
                   Chr1 <- list(c(1,2,3,4,5,6), c(7,8,9,10), c(11,12)),
                   Chr2 <- list(c(4,5), c(7,8,9,10)),
                   Chr3 <- list(c(1,2,3), c(5,6,7,8,9), c(10,11), c(14,15,16))
                  )
ListOfNames<-list(
              Chr1 <- c("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","A1","B2","C3","D4","E1","F2","G3","H4","I1"),
              Chr2 <- c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","a1","b2","c3","d4","e5","f6","g7","h8","i9","j10","k11","l12","m13","n14","o15","p16","q17","r18","s19","t20","u1","v2","w3","x4","y5","z6"),
              Chr3 <- 
c("aA","bB","cC","dD","eE","fF","gG","hH","iI","jJ","kK","lL","mM","nN","oO"," 
pP","qQ","rR","sS","tT","uU","vV","wW","xX","yY","zZ"))
ListOfIndicisedNames<-list(
                       Chr1 <- 
list(c("A","B","C","D","E","F"),c("G","H","I","J"),c("K","L")),
                       Chr2 <- list(c("d","e"), c("g","h","i","j")),
                       Chr3 <- list(c("aA","bB","cC"), 
c("eE","fF","gG","hH","iI"), c("jJ","kK"),c("nN","oO","pP"))
)

# Checking the length of the lists to be matched
if (length(ListOfIndices) != length(ListOfIndicisedNames)) {
stop("Lengths of lists do not match!")
}

# Create an empty data frame
result_table <- data.frame()

# Matching indexes and names for each chromosome
for (i in seq_along(ListOfIndices)) {
# Get indexes and names for each chromosome
indexes <- unlist(ListOfIndices[[i]])
names <- unlist(ListOfIndicisedNames[[i]])

# Match names with indexes
matched_table <- data.frame(Indeks = indexes, Name = names)

# Merge to add to the result table
result_table <- rbind(result_table, matched_table)
}

print(result_table)

**Soren** · Accepted Answer · 2024-03-27T17:03:59.783000

As the list of names and list of indexes are the same lengths, these can be evenly iterated over using mapply and each sub-list can then be iterated over with lapply

mapply(function(indexes,names) {
  lapply(indexes,function(indexes,names) { names[indexes] },names=names)  
},indexes=ListOfIndices,ListOfNames)

Also, the example suggests that you want to assign names to the list? Try using "=" instead of "<-" within the list definition:

ListOfIndices<-list(
                       Chr1 = list(c(1,2,3,4,5,6), c(7,8,9,10), c(11,12)),
                       Chr2 = list(c(4,5), c(7,8,9,10)),
                       Chr3 = list(c(1,2,3), c(5,6,7,8,9), c(10,11), c(14,15,16))
                      )

Following updates in response to additional discussions

(1) Assign names to the various lists (see comment above about using "=" instead of "<-"

(2) If different index variables are being passed in different circumstances, they can be unified around their common names (Chr1,Chr2, etc...)

(3) These can then be extracted from the list of names and list of indexes and their order will be preserved in the same order of the matched names, and also ensure their same lengths

For ListOfIndices_1:

which_names <- intersect(names(ListOfIndices_1),names(ListOfNames))
mapply(function(indexes,names) {
  lapply(indexes,function(indexes,names) { names[indexes] },names=names)  
},
indexes=ListOfIndices_1[which_names],
names=ListOfNames[which_names])

And for ListOfIndices2:

which_names <- intersect(names(ListOfIndices_2),names(ListOfNames))
mapply(function(indexes,names) {
  lapply(indexes,function(indexes,names) { names[indexes] },names=names)  
},
indexes=ListOfIndices_2[which_names],
names=ListOfNames[which_names])

Subset a nested list by a nested list of indices

There are 4 best solutions below

Avoiding nested iteration

Related Questions in R

Related Questions in LIST

Related Questions in INDEXING

Related Questions in NESTED

Related Questions in SUBSET

Trending Questions

Popular # Hahtags

Popular Questions