I was trying to optimize some of the Rcpp code in the mrf2d package. After some profiling I found the bottleneck was on the following function:
NumericVector conditional_probabilities_mrf(const IntegerMatrix &Z,
const IntegerVector position,
const IntegerMatrix R,
const arma::fcube &theta,
const int N, const int M,
const int n_R, const int C){
NumericVector probs(C+1);
double this_prob;
int dx, dy;
int x = position[0] -1; int y = position[1] -1;
for(int value = 0; value <= C; value++){
this_prob = 0.0;
for(int i = 0; i < n_R; i++){
dx = R(i,0); dy = R(i,1);
if(0 <= x+dx && x+dx < N && 0 <= y+dy && y+dy < M){
this_prob = this_prob + theta(value, Z(x+dx, y+dy), i);}
if(0 <= x-dx && x-dx < N && 0 <= y-dy && y-dy < M){
this_prob = this_prob + theta(Z(x-dx, y-dy), value, i);}
}
probs[value] = exp(this_prob);
}
return(probs/sum(probs));
}
While exploring my options, I was tweaking with types and started using more Armadillo types, and changed to (note that the only difference is the use of arma::colvec instead of NumericVector):
arma::colvec conditional_probabilities_mrf(const IntegerMatrix &Z,
const IntegerVector position,
const IntegerMatrix R,
const arma::fcube &theta,
const int N, const int M,
const int n_R, const int C){
arma::colvec probs(C+1);
double this_prob;
int dx, dy;
int x = position[0] -1; int y = position[1] -1;
for(int value = 0; value <= C; value++){
this_prob = 0.0;
for(int i = 0; i < n_R; i++){
dx = R(i,0); dy = R(i,1);
if(0 <= x+dx && x+dx < N && 0 <= y+dy && y+dy < M){
this_prob = this_prob + theta(value, Z(x+dx, y+dy), i);}
if(0 <= x-dx && x-dx < N && 0 <= y-dy && y-dy < M){
this_prob = this_prob + theta(Z(x-dx, y-dy), value, i);}
}
probs[value] = exp(this_prob);
}
return(probs/sum(probs));
}
This change alone made my code a lot faster, see the following results from the profiler (before and after, respectively):

The same code was used in the two pictures of the profiler data. Note that, the function that appears as a bottleneck in the profiler data is the log_pl_mrf that uses the conditional_probabilities_mrf function, and is defined as:
double log_pl_mrf(const IntegerMatrix Z,
const IntegerMatrix R,
const arma::fcube theta){
int N = Z.nrow(); int M = Z.ncol();
int n_R = R.nrow();
int C = theta.n_rows - 1;
double log_pl = 0.0;
double this_cond_prob = 0.0;
IntegerVector position(2);
int zij = -1;
for(int i = 0; i < N; i++){
for(int j = 0; j < M; j++){
zij = Z(i,j);
position[0] = i+1; position[1] = j+1;
this_cond_prob = conditional_probabilities_mrf(Z, position, R, theta, N, M, n_R, C)(zij);
log_pl += log(this_cond_prob);
}
}
return(log_pl);
}
Why did the simple change of return type made the performance so much better? How can I understand that? While I'm glad that it did improve, I think understanding why would be important so I could possibly find more performance improvement opportunities in the future.
