Estimate the Expected Run Time of a loop or nested loops

315 Views Asked by At

I am using the R programming language. I am interested in knowing if there is a way to estimate the actual run time of a procedure (relative to the "strength" of your computer) without actually running that procedure.

For example, suppose I want to determine how long the below procedure takes to run on my computer :

 library(caret)
    library(rpart)

#generate data

a = rnorm(80000, 10, 10)
b = rnorm(80000, 10, 5)
c = rnorm(80000, 5, 10)
group <- sample( LETTERS[1:2], 80000, replace=TRUE, prob=c(0.5,0.5))
group_1 <- 1:80000

#put data into a frame
d = data.frame(a,b,c, group, group_1)
d$group = as.factor(d$group)

e <- d
vec1 <- sample(200:300, 5)
vec2 <- sample(400:500,5)
vec3 <- sample(700:800,5)
z <- 0
df <- expand.grid(vec1, vec2, vec3)
df$Accuracy <- NA

for (i in seq_along(vec1)) { 
    for (j in seq_along(vec2)) {
        for (k in seq_along(vec3)) {
            # d <- e
            d$group_2 = as.integer(ifelse(d$group_1 < vec1[i] , 0, ifelse(d$group_1 >vec1[i]  & d$group_1 < vec2[j] , 1, ifelse(d$group_1 >vec2[j]  & d$group_1 < vec3[k] , 2,3))))
            
            d$group_2 = as.factor(d$group_2)
            
            
            
            TreeFit <- rpart(group_2 ~ ., data = d[,-5])
            
            pred <- predict(
                TreeFit,
                d[,-5], type = "class")
            
            con <- confusionMatrix(
                d$group_2,
                pred) 
            
            #update results into table
            #final_table[i,j] = con$overall[1]
            z <- z + 1
            df$Accuracy[z] <- con$overall[1]
        }
    }
}

head(df)

I could just "sandwich" that procedure between the following lines of code and determine how long it took

start_time <- proc.time()

#copy and paste the entire block of code here

 proc.time() - start_time

#results

 user  system elapsed 
  51.86    0.36   52.22 

But suppose it is a really lengthy procedure and I want to roughly estimate how long it will take for my computer to run before actually running it - is this possible?

Thanks

1

There are 1 best solutions below

1
stevec On

Since you're using nested loops, instead of timing the whole thing, try timing the first of, or small number of, iterations of the loop..

E.g. instead of

for (i in seq_along(vec1)) { 
    for (j in seq_along(vec2)) {
        for (k in seq_along(vec3)) {

try iterating along only the first few elements of each

for (i in seq_along(vec1[1:3])) { 
    for (j in seq_along(vec2[1:3])) {
        for (k in seq_along(vec3[1:3])) {

or whatever makes sense for your use case.

Once you know the timing for a small subset of the data, you could make an educated guess as to how long it may take for larger datasets.