Performance problem with igraph when studying Axelrod Model

66 Views Asked by At

When trying to apply Axelrod Model for Dissemination of Culture with the R igraph package, I get performance issues on the time of executions.

After generating an LxL lattice, I assign a vector of F random numbers ranging from 1 to q to every node.

Then I iterate by:

  • choosing a random node s
  • choosing a random neighbor n of node s
  • calculate how many numbers they have in common (equals) as a percentage
  • with probability equal to the previous percentage, I take one of n's feature that's different from s and assign that to s

Repeat that until convergence (i.e. every node has neighbours either completely equal or completely similar).

Here the code (F, q, R are all given non-critical values present in other cells):

calc_edges_weights <- function(g){
  
  edge_weights <- replicate(length(E(g)), 1, simplify=FALSE)
  
  for(i in 1:length(E(g))){
    
    edge <- E(g)[i]
    
    ##Extract the two nodes
    first_node <- ends(g, edge)[1]
    second_node <- ends(g, edge)[2]
    
    compatibility <- chance_interaction(V(g)$cultures[[first_node]], V(g)$cultures[[second_node]])
    
    ## Assign the values for representation
    edge_weights[i] <- compatibility
  }
  
  return(edge_weights)
}



chance_interaction <- function(cul1, cul2){
  common <- sum(cul1==cul2)
  total <- length(cul1)
  perc_eq <- common/total
  return(perc_eq)
}



epoch <- 0

N <- L**2

g <- make_lattice(length = L, dim = 2)
V(g)$cultures <- replicate(N, sample(1:q, F, replace=TRUE), simplify=FALSE)
# plot_graph(g, labels=FALSE)

edge_weights <- calc_edges_weights(g)

while(TRUE){
  
  epoch <- epoch + 1
  
  ##Choose one vertex at random
  chosen <- sample(1:N, 1, replace=TRUE)
  
  ##Take neighbours of that vertex, pick one at random
  neighbours <- neighborhood(g, chosen, order=R)[[1]]
  chosen_neighbour <- neighbours[sample(1:length(neighbours), 1, replace=TRUE)]

  ##Calculate chance of interaction
  chance_change <- chance_interaction(V(g)$cultures[[chosen]], V(g)$cultures[[chosen_neighbour]])
  
  ##If success in chance, update one of the remaining traits
  if((runif(1, min=0, max=chance_change) < chance_change) && chance_change!=1){
    loop_flag = TRUE

    while(loop_flag){
      
      feat_index <- sample(1:F, 1, replace=TRUE)
      
      if(V(g)$cultures[[chosen]][feat_index] != V(g)$cultures[[chosen_neighbour]][feat_index]){
        
        new_attr <- V(g)$cultures[[chosen]]
        
        new_attr[feat_index] <- V(g)$cultures[[chosen_neighbour]][feat_index]

        V(g)$cultures[[chosen]] <- new_attr
        loop_flag <- FALSE
        
        
      }
    }
  }
  
  ##Stop conditions
  if(all(edge_weights %in% c(0, 1))){ break }
  
  if(epoch%%(200) == 0){
    edge_weights <- calc_edges_weights(g)
    n_alpha <- sum(edge_weights != 0 & edge_weights != 1)/length(edge_weights)
    cat(epoch, ",", n_alpha, "\n")
  }
  
}

Any cat command is for showing purposes when testing.

You can use these as example values:

L <- 10
q <- 40
F <- 10
R <- 1

The problem here is that convergence is waaay slow and time of convergence grows exponentially with Lattice size. For example, with L = 10 I get a convergence after 30 seconds, whereas with L = 22 I get it after 2 hours (!!).

I tried changing the operations, finding other functions in the library but I can't seem to improve the efficiency.

Time analysis confirms an exponential increase with L size (as expected by the theory) but it should not be this slow.

I tried reducing the amount of times the edge_weights are calculated (since they are only useful to STOP the evolution) but to no avail.

Is there a better way to approach this issue?

1

There are 1 best solutions below

2
jblood94 On

An approach that can simulate the L = 22 case in a matter of seconds:

By keeping track of the probability of the next trait transfer occurring across each edge, we can avoid looping through epochs that result in no transfers. This will increase performance, and we can still keep track of the number of epochs that pass using a negative binomial random variate (sampled using rnbinom) to simulate the number of epochs resulting in no transfers between the epochs where transfers occur.

The function below uses a data.table approach. Each row of the data.table represents a directed edge. The column labeled w is the probability that the edge is selected during the next epoch and the edge is selected to transfer a trait. However, given the sampling method, an edge can be selected to transfer a trait when there are no traits eligible for transfer (because the receiving vertex already has all the traits present in the originating vertex), so we also keep track of the number of traits eligible for transfer across each edge (the column labeled n).

The function:

fDissSim <- function(L, q, F, R) {
  epoch <- 0L # initialize the epoch count
  
  # create a random matrix of initial trait occurrences (each node gets `F`
  # random traits, numbered 1 to `q`, with replacement)
  m <- matrix(FALSE, q, L^2)
  m[cbind(sample(q, F*L^2, 1), rep(1:L^2, F))] <- TRUE
  
  # create a data.table of edges
  dt <- neighborhood(make_lattice(c(L, L)), R, mindist = 1)
  dt <- data.table(from = rep.int(1:length(dt), lengths(dt)), to = unlist(dt))
  dt <- rbindlist(
    list(
      dt,
      dt[,.(from = to, to = from)]
    )
  )
  # Initialize a matrix storing the eligibility of a trait to be transferred
  # along an edge. The traits are along the rows, and the edges are along the
  # columns.
  a <- dt[,m[,from] & !m[,to]]
  dt[
    # get the probability of the edge being selected at any given epoch and
    # divide that by q
    ,`:=`(p = 1/tabulate(from)[from]/L^2/q, n = as.integer(colSums(a)))
  ][
    # the probability that a transfer will occur across each edge on the next
    # epoch (sampling weights)
    ,w := p*colSums(m[,from]*m[,to])
  ]
  
  N <- nrow(dt) # the number of edges
  
  # for each vertex, list the edges where the vertex is the originating one
  iTo <- dt[,.(.(.I)), keyby = to][[2]]
  # for each vertex, list the edges where the vertex is the receiving one
  iFrom <- dt[,.(.(.I)), keyby = from][[2]]
  
  # perform the simulation
  while(any(dt$n)) {
    ww <- dt$w*(dt$n != 0L) # weights for edge sampling
    epoch <- epoch + rnbinom(1, 1, sum(ww)) + 1L # increment epoch
    # sample an edge
    i <- sample(N, 1, 0, ww)
    v <- dt$to[i] # the receiving vertex
    # sample a trait to be transferred across the edge
    j <- which(a[,i]) # don't sample this directly since it can have length 1
    j <- j[sample(length(j), 1)]
    
    # update `a` where vertex `v` is an originating vertex
    b <- a[j, iFrom[[v]]] <- !m[j, dt$to[iFrom[[v]]]]
    # update the number of traits eligible for transfer across affected edges
    dt[iFrom[[v]][b], n := n + 1L]
    # update the sampling weights for affected edges
    dt[iFrom[[v]][!b], w := w + p]
    
    # update `a` where `v` is a receiving vertex
    a[j, iTo[[v]]] <- FALSE
    # update the sampling weights and number of traits eligible for transfer
    dt[iTo[[v]][m[j, from[iTo[[v]]]]], `:=`(w = w + p, n = n - 1L)]
    
    # update the trait matrix
    m[j, v] <- TRUE
  }
  
  list(epochs = epoch, traits = m)
}

Testing with timing. The L = 10, q = 40, F = 10, R = 1 case takes a couple seconds to simulate nearly 18k epochs.

set.seed(546436659)
system.time(res <- fDissSim(10, 40, 10, 1))
#>    user  system elapsed 
#>    2.09    0.01    2.11
res[[1]]
#> [1] 17868
all(res[[2]])
#> [1] TRUE

Changing R to 2:

system.time(res <- fDissSim(10, 40, 10, 2))
#>    user  system elapsed 
#>    2.25    0.05    2.29
res[[1]]
#> [1] 20711
all(res[[2]])
#> [1] TRUE

The L = 22 case takes less than 20 seconds to simulate more than 100k epochs.

system.time(res <- fDissSim(22, 40, 10, 1))
#>    user  system elapsed 
#>   16.26    0.33   16.63
res[[1]]
#> [1] 103159
all(res[[2]])
#> [1] TRUE

Stress-test with L = 100. It simulates more than 2M epochs in about 30 minutes.

system.time(res <- fDissSim(100, 40, 10, 1))
#>    user  system elapsed 
#> 2014.94   20.14 2037.32
res[[1]]
#> [1] 2084378
all(res[[2]])
#> [1] TRUE