I have a list, which contains 4438 dataframes with different sizes. I am not sure how to make a reproducible example, but the way I obtained the list is using the expand.grid function to have a dataframe with all the possible combination of elements:
citation <- citation %>%
map_depth(., 1, expand.grid)
List before applying expand.grid

List after applying expand.grid

What I am going to achieve is for each dataframe, counting the number of unique values per row, and finding the minimum number of unique values in the dataframe.
First, I write the function below
fun1 <- function(res){
min(apply(res,1,function(x) length(unique(x))))
}
Then, apply the function to each dataframe:
library(furrr)
plan(multisession, workers = 4)
min_set <- c()
min_set <- citation %>% future_map_dbl(fun1)
However, the calculation is super slow, almost 30 mins to complete. I would like to find another way to accelerate the performance. Looking forward to hear the solution from you guys. Thank you in advance
To speed up the current approach of enumerating the combinations, use
rowTabulatefrom theRfastpackage (orrowTabulatesfrom thematrixStatspackage).However, it will be much faster to get the desired results with the
setcoverfunction in theadagiopackage, which solves the set cover problem directly (i.e., without the use ofexpand.grid) via integer linear programming withlpfrom thelpSolvepackage.