How to speed up filling matrix of size (n,m) with gower distance?

42 Views Asked by Michał Mazur At 21 January 2024 at 00:28

I'm having an issue with my own project regarding preparation of KNN method for each possible combination of problems (regression, multiclass classification, binary classification). I want to implement "gower distance" into problems where there is mixed data.

I implemented code in R, it is calculated exactly the same as:

library(StatMatch)
gower.dist()

Results are equal, so the definition of Gower should be implemented correctly. My code below:

gowerDist <- function(info, x1, x2) {
  sum <- 0
  for (i in 1:ncol(x1)) {
    if (info$type[i] %in% c("categorial", "binary")) {
      if (x1[, i] != x2[, i]) {
        sum <- sum + 1
      }
    } else if (info$type[i] %in% c("ordered", "numeric")) {
      if (info$range[i] != 0) {
        sum <- sum + abs((as.numeric(x2[, i]) - as.numeric(x1[, i]))/
                           info$range[i])
      }
    } 
  }
  return(sum/(ncol(x1)))
}

gowerDistance <- function(dataTrain, dataTest) {
  data_imported <- rbind.data.frame(dataTrain, dataTest)
  information <- list(range=c(), type=c())
  for (i in 1:ncol(data_imported)) {
    information$type[i] <- check_variable_type(data_imported[, i])
    if (information$type[i] == "numeric") {
      information$range[i] <- c(as.numeric(max(data_imported[, i])) - 
                                  as.numeric(min(data_imported[, i])))
    } else if (information $typ[i] == "ordered") {
      numeric_values <- as.numeric(data_imported[, i])
      information$range[i] <- diff(range(numeric_values))
    } else information$range[i] <- NA
  }
  distances <- matrix( 0, nrow(dataTest), nrow(dataTrain))
  for (i in 1:nrow(dataTrain)) {
    for (j in 1:nrow(dataTest)) {
      distances[j, i] <- gowerDist(information, dataTest[j, ], dataTrain[i, ])
    }
  }
  return(distances)
}

The problem i'm having is with the time complexicity of this code - in case of around 300 observations the code executes slowly, with more cases it might be stuck for hours. I wanted to perform crossvalidation of models with the Gower scale, so i'd like to speed up the process, however I don't want to implement complex structues. Is it possible to speed up that code?

Original Q&A

How to speed up filling matrix of size (n,m) with gower distance?

There are 0 best solutions below

Related Questions in R

Related Questions in DISTANCE

Trending Questions

Popular # Hahtags

Popular Questions