I have code which executes a large number of summarise and it takes ages to run.
eg:
library(dplyr)
df <- data.frame(Letter = letters, Num = c(1 : (26*10) ))
for (x in 1:10000){
df_sum_Tot = summarise(df, Sum_Num = sum(Num) )
df_sum_Letter = summarise(df, Sum_Num = sum(Num) , .by = Letter )
}
Is there a more efficient alternative to summarise I could use to speed it up?
If you're working with thousands of different datasets, you could put them all into a list and use
lapplyto summarise them all, rather than using aforloop.Other packages can also be much more efficient for summarising than
dplyr, especially with large datasets. For example,data.tableorcollapse: