Using arma::mat instead of NumericVector made my code twice as fast. Why?

62 Views Asked by At

I was trying to optimize some of the Rcpp code in the mrf2d package. After some profiling I found the bottleneck was on the following function:

NumericVector conditional_probabilities_mrf(const IntegerMatrix &Z,
                                            const IntegerVector position,
                                            const IntegerMatrix R,
                                            const arma::fcube &theta,
                                            const int N, const int M,
                                            const int n_R, const int C){

  NumericVector probs(C+1);
  double this_prob;
  int dx, dy;
  int x = position[0] -1; int y = position[1] -1;

  for(int value = 0; value <= C; value++){
    this_prob = 0.0;
    for(int i = 0; i < n_R; i++){
      dx = R(i,0); dy = R(i,1);
      if(0 <= x+dx && x+dx < N && 0 <= y+dy && y+dy < M){
        this_prob = this_prob + theta(value, Z(x+dx, y+dy), i);}
      if(0 <= x-dx && x-dx < N && 0 <= y-dy && y-dy < M){
        this_prob = this_prob + theta(Z(x-dx, y-dy), value, i);}
    }
    probs[value] = exp(this_prob);
  }
  return(probs/sum(probs));
}

While exploring my options, I was tweaking with types and started using more Armadillo types, and changed to (note that the only difference is the use of arma::colvec instead of NumericVector):

arma::colvec conditional_probabilities_mrf(const IntegerMatrix &Z,
                                            const IntegerVector position,
                                            const IntegerMatrix R,
                                            const arma::fcube &theta,
                                            const int N, const int M,
                                            const int n_R, const int C){

  arma::colvec probs(C+1);
  double this_prob;
  int dx, dy;
  int x = position[0] -1; int y = position[1] -1;

  for(int value = 0; value <= C; value++){
    this_prob = 0.0;
    for(int i = 0; i < n_R; i++){
      dx = R(i,0); dy = R(i,1);
      if(0 <= x+dx && x+dx < N && 0 <= y+dy && y+dy < M){
        this_prob = this_prob + theta(value, Z(x+dx, y+dy), i);}
      if(0 <= x-dx && x-dx < N && 0 <= y-dy && y-dy < M){
        this_prob = this_prob + theta(Z(x-dx, y-dy), value, i);}
    }
    probs[value] = exp(this_prob);
  }
  return(probs/sum(probs));
}

This change alone made my code a lot faster, see the following results from the profiler (before and after, respectively): enter image description here

enter image description here

The same code was used in the two pictures of the profiler data. Note that, the function that appears as a bottleneck in the profiler data is the log_pl_mrf that uses the conditional_probabilities_mrf function, and is defined as:

double log_pl_mrf(const IntegerMatrix Z,
                  const IntegerMatrix R,
                  const arma::fcube theta){

  int N = Z.nrow(); int M = Z.ncol();
  int n_R = R.nrow();
  int C = theta.n_rows - 1;
  double log_pl = 0.0;
  double this_cond_prob = 0.0;
  IntegerVector position(2);
  int zij = -1;

  for(int i = 0; i < N; i++){
    for(int j = 0; j < M; j++){
      zij = Z(i,j);
      position[0] = i+1; position[1] = j+1;
      this_cond_prob = conditional_probabilities_mrf(Z, position, R, theta, N, M, n_R, C)(zij);
      log_pl += log(this_cond_prob);
    }
  }
  return(log_pl);
}

Why did the simple change of return type made the performance so much better? How can I understand that? While I'm glad that it did improve, I think understanding why would be important so I could possibly find more performance improvement opportunities in the future.

0

There are 0 best solutions below