Subset a nested list by a nested list of indices

98 Views Asked by At

I'm trying to subset this issue using R base. I have a list of names that I want to subset by severam nested lists of indices. The list of names comes in as very compact and straightforward, the list of indices is nested an the indices may have leaps, ie, some numbers can be non-consecutive; each sublist starts the count from 1.

The index numbers may go until ~80, but they're always less than the vector of names, ie. it's always possible to use them to subset the element within the List OfNames.

Data are like this for the various ListOfIndices_n:

ListOfIndices_1<-list(
                       Chr1 = list(c(1,2,3,4,5,6), c(7,8,9,10), c(11,12)),
                       Chr2 = list(c(4,5), c(7,8,9,10)),
                       Chr3 = list(c(1,2,3), c(5,6,7,8,9), c(10,11), c(14,15,16))
                      )


ListOfIndices_2<-list(
                       Chr1 = list(c(1,2),c(5,6,7),c(10,11,12)),
                       Chr2 = list(c(1,2,3), c(4,5), c(9,10), c(13,14,15,16)),
                       Chr3 = list(c(1,2), c(5,6,7), c(10,11), c(15,16)),
                       Chr4 = list(c(2,3,4), c(5,6), c(8,9,10,11), c(13,14,15,16))
                      )

and like this for ListOfNames:

ListOfNames<-list(
                  Chr1 = c("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","A1","B2","C3","D4","E1","F2","G3","H4","I1"),
                  Chr2 = c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","a1","b2","c3","d4","e5","f6","g7","h8","i9","j10","k11","l12","m13","n14","o15","p16","q17","r18","s19","t20","u1","v2","w3","x4","y5","z6"),
                  Chr3 = c("aA","bB","cC","dD","eE","fF","gG","hH","iI","jJ","kK","lL","mM","nN","oO","pP","qQ","rR","sS","tT","uU","vV","wW","xX","yY","zZ")
                  Chr4 = c("aA1","bB2","cC3","dD4","eE5","fF6","gG7","hH8","iI9","jJ10","kK11","lL12","mM13","nN14","oO15","pP16","qQ17","rR18","sS19","tT20","uU21","vV22","wW23","xX24","yY25","zZ26")
)

What I want to get are several ListOfIndicisedNames, each the result of ListOfNames subsetted by a different ListOfIndices_n:

ListOfIndicisedNames_1<-list(
                           Chr1 = list(c("A","B","C","D","E","F"),c("G","H","I","J"),c("K","L")),
                           Chr2 = list(c("d","e"), c("g","h","i","j")),
                           Chr3 = list(c("aA","bB","cC"), c("eE","fF","gG","hH","iI"), c("jJ","kK"),c("nN","oO","pP"))
)

ListOfIndicisedNames_2<-list(
                           Chr1 = list(c("A","B") c("E","F","G") c("J","K","L")),
                           Chr2 = list(c("a","b","c"), c("d","e"), c("i","h"), c("m","n","o","p"),
                           Chr3 = list(c("aA","bB"), c("eE","fF","gG"), c("iI","jJ"), c("oO","pP"))
                           Chr4 = list(c("bB2","cC3","dD4"), c("eE5","fF6"), c("hH8","iI9","jJ10","kK11"), c("mM13","nN14","oO15","pP16")
)

I've tried several flavours of lapply() and sapply(), though I didn't manage to get do.call() - I still find it very magical whenever I manage to make it work.

I think that another way could be gettig the result by a for loop with i and j for lists and sublists respectively, but every attempt has made R angry.

Could someone please help me in this maze?

4

There are 4 best solutions below

7
Soren On BEST ANSWER

As the list of names and list of indexes are the same lengths, these can be evenly iterated over using mapply and each sub-list can then be iterated over with lapply

mapply(function(indexes,names) {
  lapply(indexes,function(indexes,names) { names[indexes] },names=names)  
},indexes=ListOfIndices,ListOfNames)

Also, the example suggests that you want to assign names to the list? Try using "=" instead of "<-" within the list definition:

ListOfIndices<-list(
                       Chr1 = list(c(1,2,3,4,5,6), c(7,8,9,10), c(11,12)),
                       Chr2 = list(c(4,5), c(7,8,9,10)),
                       Chr3 = list(c(1,2,3), c(5,6,7,8,9), c(10,11), c(14,15,16))
                      )

Following updates in response to additional discussions

(1) Assign names to the various lists (see comment above about using "=" instead of "<-"

(2) If different index variables are being passed in different circumstances, they can be unified around their common names (Chr1,Chr2, etc...)

(3) These can then be extracted from the list of names and list of indexes and their order will be preserved in the same order of the matched names, and also ensure their same lengths

For ListOfIndices_1:

which_names <- intersect(names(ListOfIndices_1),names(ListOfNames))
mapply(function(indexes,names) {
  lapply(indexes,function(indexes,names) { names[indexes] },names=names)  
},
indexes=ListOfIndices_1[which_names],
names=ListOfNames[which_names])

And for ListOfIndices2:

which_names <- intersect(names(ListOfIndices_2),names(ListOfNames))
mapply(function(indexes,names) {
  lapply(indexes,function(indexes,names) { names[indexes] },names=names)  
},
indexes=ListOfIndices_2[which_names],
names=ListOfNames[which_names])
0
SamR On

Avoiding nested iteration

An approach to do this with out nesting lapply() or map() statements is to write a function which to subset a vector, v, by a list of vectors, l, without iteration. We can split() v by the groups defined by the indices in l, and then unlist() to extract the elements according to those indices. Finally, we can use rep() to repeat the indices as needed to ensure proper grouping. unname() simply strips the unnecessary names that are assigned by split().

subset_vector_by_list <- function(v, l) {
    split(
        v[unlist(l)],
        rep(seq_along(len <- lengths(l)), len),
    ) |> unname()
}

Then you can write a function to iterate over your list of indices:

subset_list_by_list <- function(names_l, index_l) {
    lapply(names(index_l), \(nm, nm_list = names_l, l = index_l)
    subset_vector_by_list(nm_list[[nm]], l[[nm]]))
}

Then again you can just apply this to your relevant lists:

subset_list_by_list(ListOfNames, ListOfIndices_1)
List of 3
 $ :List of 3
  ..$ : chr [1:6] "A" "B" "C" "D" ...
  ..$ : chr [1:4] "G" "H" "I" "J"
  ..$ : chr [1:2] "K" "L"
 $ :List of 2
  ..$ : chr [1:2] "d" "e"
  ..$ : chr [1:4] "g" "h" "i" "j"
 $ :List of 4
  ..$ : chr [1:3] "aA" "bB" "cC"
  ..$ : chr [1:5] "eE" "fF" "gG" "hH" ...
  ..$ : chr [1:2] "jJ" "kK"
  ..$ : chr [1:3] "nN" "oO" "pP"

And

subset_list_by_list(ListOfNames, ListOfIndices_2)
[[1]]
[[1]][[1]]
[1] "A" "B"

[[1]][[2]]
[1] "E" "F" "G"

[[1]][[3]]
[1] "J" "K" "L"


[[2]]
[[2]][[1]]
[1] "a" "b" "c"

[[2]][[2]]
[1] "d" "e"

[[2]][[3]]
[1] "i" "j"

[[2]][[4]]
[1] "m" "n" "o" "p"


[[3]]
[[3]][[1]]
[1] "aA" "bB"

[[3]][[2]]
[1] "eE" "fF" "gG"

[[3]][[3]]
[1] "jJ" "kK"

[[3]][[4]]
[1] "oO" "pP"


[[4]]
[[4]][[1]]
[1] "bB2" "cC3" "dD4"

[[4]][[2]]
[1] "eE5" "fF6"

[[4]][[3]]
[1] "hH8"  "iI9"  "jJ10" "kK11"

[[4]][[4]]
[1] "mM13" "nN14" "oO15" "pP16"
1
Carl On

A tidyverse option:

library(tidyverse)

ListOfIndices<-list(
  Chr1 = list(c(1,2,3,4,5,6), c(7,8,9,10), c(11,12)),
  Chr2 = list(c(4,5), c(7,8,9,10)),
  Chr3 = list(c(1,2,3), c(5,6,7,8,9), c(10,11), c(14,15,16))
)

ListOfNames<-list(
  Chr1 = c("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","A1","B2","C3","D4","E1","F2","G3","H4","I1"),
  Chr2 = c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","a1","b2","c3","d4","e5","f6","g7","h8","i9","j10","k11","l12","m13","n14","o15","p16","q17","r18","s19","t20","u1","v2","w3","x4","y5","z6"),
  Chr3 = c("aA","bB","cC","dD","eE","fF","gG","hH","iI","jJ","kK","lL","mM","nN","oO","pP","qQ","rR","sS","tT","uU","vV","wW","xX","yY","zZ"))

new <- map2(ListOfIndices, ListOfNames, \(x, y) map(x, \(x) (y)[x]))
str(new)
#> List of 3
#>  $ Chr1:List of 3
#>   ..$ : chr [1:6] "A" "B" "C" "D" ...
#>   ..$ : chr [1:4] "G" "H" "I" "J"
#>   ..$ : chr [1:2] "K" "L"
#>  $ Chr2:List of 2
#>   ..$ : chr [1:2] "d" "e"
#>   ..$ : chr [1:4] "g" "h" "i" "j"
#>  $ Chr3:List of 4
#>   ..$ : chr [1:3] "aA" "bB" "cC"
#>   ..$ : chr [1:5] "eE" "fF" "gG" "hH" ...
#>   ..$ : chr [1:2] "jJ" "kK"
#>   ..$ : chr [1:3] "nN" "oO" "pP"

Created on 2024-03-27 with reprex v2.1.0

0
Alper Göktuğ Tamer On

Here it is, it is a little bit complicated but it works

ListOfIndices<-list(
                   Chr1 <- list(c(1,2,3,4,5,6), c(7,8,9,10), c(11,12)),
                   Chr2 <- list(c(4,5), c(7,8,9,10)),
                   Chr3 <- list(c(1,2,3), c(5,6,7,8,9), c(10,11), c(14,15,16))
                  )
ListOfNames<-list(
              Chr1 <- c("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","A1","B2","C3","D4","E1","F2","G3","H4","I1"),
              Chr2 <- c("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","a1","b2","c3","d4","e5","f6","g7","h8","i9","j10","k11","l12","m13","n14","o15","p16","q17","r18","s19","t20","u1","v2","w3","x4","y5","z6"),
              Chr3 <- 
c("aA","bB","cC","dD","eE","fF","gG","hH","iI","jJ","kK","lL","mM","nN","oO"," 
pP","qQ","rR","sS","tT","uU","vV","wW","xX","yY","zZ"))
ListOfIndicisedNames<-list(
                       Chr1 <- 
list(c("A","B","C","D","E","F"),c("G","H","I","J"),c("K","L")),
                       Chr2 <- list(c("d","e"), c("g","h","i","j")),
                       Chr3 <- list(c("aA","bB","cC"), 
c("eE","fF","gG","hH","iI"), c("jJ","kK"),c("nN","oO","pP"))
)

# Checking the length of the lists to be matched
if (length(ListOfIndices) != length(ListOfIndicisedNames)) {
stop("Lengths of lists do not match!")
}

# Create an empty data frame
result_table <- data.frame()

# Matching indexes and names for each chromosome
for (i in seq_along(ListOfIndices)) {
# Get indexes and names for each chromosome
indexes <- unlist(ListOfIndices[[i]])
names <- unlist(ListOfIndicisedNames[[i]])

# Match names with indexes
matched_table <- data.frame(Indeks = indexes, Name = names)

# Merge to add to the result table
result_table <- rbind(result_table, matched_table)
}

print(result_table)