How to speed up filling matrix of size (n,m) with gower distance?

42 Views Asked by At

I'm having an issue with my own project regarding preparation of KNN method for each possible combination of problems (regression, multiclass classification, binary classification). I want to implement "gower distance" into problems where there is mixed data.

I implemented code in R, it is calculated exactly the same as:

library(StatMatch)
gower.dist()

Results are equal, so the definition of Gower should be implemented correctly. My code below:

gowerDist <- function(info, x1, x2) {
  sum <- 0
  for (i in 1:ncol(x1)) {
    if (info$type[i] %in% c("categorial", "binary")) {
      if (x1[, i] != x2[, i]) {
        sum <- sum + 1
      }
    } else if (info$type[i] %in% c("ordered", "numeric")) {
      if (info$range[i] != 0) {
        sum <- sum + abs((as.numeric(x2[, i]) - as.numeric(x1[, i]))/
                           info$range[i])
      }
    } 
  }
  return(sum/(ncol(x1)))
}
gowerDistance <- function(dataTrain, dataTest) {
  data_imported <- rbind.data.frame(dataTrain, dataTest)
  information <- list(range=c(), type=c())
  for (i in 1:ncol(data_imported)) {
    information$type[i] <- check_variable_type(data_imported[, i])
    if (information$type[i] == "numeric") {
      information$range[i] <- c(as.numeric(max(data_imported[, i])) - 
                                  as.numeric(min(data_imported[, i])))
    } else if (information $typ[i] == "ordered") {
      numeric_values <- as.numeric(data_imported[, i])
      information$range[i] <- diff(range(numeric_values))
    } else information$range[i] <- NA
  }
  distances <- matrix( 0, nrow(dataTest), nrow(dataTrain))
  for (i in 1:nrow(dataTrain)) {
    for (j in 1:nrow(dataTest)) {
      distances[j, i] <- gowerDist(information, dataTest[j, ], dataTrain[i, ])
    }
  }
  return(distances)
}

The problem i'm having is with the time complexicity of this code - in case of around 300 observations the code executes slowly, with more cases it might be stuck for hours. I wanted to perform crossvalidation of models with the Gower scale, so i'd like to speed up the process, however I don't want to implement complex structues. Is it possible to speed up that code?

0

There are 0 best solutions below