Speeding up array access in R

105 Views Asked by At

I have R code that has a pair of relatively slow array access steps that I want to speed up. Essentially, it looks something like this:

termsA = matrix(data = NA,nrow = nrow(matLargeA),ncol = 15)

termsB = energy.terms
for (j in 1:15){
            
            termsA[,j] = matSmall[matLargeA[,j],j] #***
            
            termsB[,j] = matSmall[matLargeB[,j],j] #***
            

}

rowsA = rowSums(termsA)
rowsB = rowSums(termsB)

The lines I'm trying to speed up are the ones that end in #***

matSmall is a 4 x 15 matrix of nonnegative doubles. It is mostly arbitrary: the only weird things about it are that is must have exactly one zero element in each column, and all elements are less than 4. Otherwise, the columns do not relate to each other in any way.

Both matLargeA and matLargeB are 1000s x 15 matrices of doubles whose entries are the integers 1 through 4. However, they are NOT arbitrarily arranged: each matrix is based on a different sequence of integers 1 through 4, and the rows are 15 element windows of that sequence. For example:

> matLargeA[1:17,]
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
 [1,]    3    4    3    1    1    3    2    2    3     3     3     1     1
 [2,]    4    3    1    1    3    2    2    3    3     3     1     1     3
 [3,]    3    1    1    3    2    2    3    3    3     1     1     3     2
 [4,]    1    1    3    2    2    3    3    3    1     1     3     2     3
 [5,]    1    3    2    2    3    3    3    1    1     3     2     3     3
 [6,]    3    2    2    3    3    3    1    1    3     2     3     3     1
 [7,]    2    2    3    3    3    1    1    3    2     3     3     1     2
 [8,]    2    3    3    3    1    1    3    2    3     3     1     2     3
 [9,]    3    3    3    1    1    3    2    3    3     1     2     3     1
[10,]    3    3    1    1    3    2    3    3    1     2     3     1     3
[11,]    3    1    1    3    2    3    3    1    2     3     1     3     3
[12,]    1    1    3    2    3    3    1    2    3     1     3     3     1
[13,]    1    3    2    3    3    1    2    3    1     3     3     1     4
[14,]    3    2    3    3    1    2    3    1    3     3     1     4     2
[15,]    2    3    3    1    2    3    1    3    3     1     4     2     4
[16,]    3    3    1    2    3    1    3    3    1     4     2     4     1
[17,]    3    1    2    3    1    3    3    1    4     2     4     1     3
      [,14] [,15]
 [1,]     3     2
 [2,]     2     3
 [3,]     3     3
 [4,]     3     1
 [5,]     1     2
 [6,]     2     3
 [7,]     3     1
 [8,]     1     3
 [9,]     3     3
[10,]     3     1
[11,]     1     4
[12,]     4     2
[13,]     2     4
[14,]     4     1
[15,]     1     3
[16,]     3     4
[17,]     4     4 

I mainly need rowsA and rowsB for what I'm trying to do in the rest of the program. But, I arranged the code to generate termsA and termsB first because doing this sort of column first type access is a LOT faster than doing the row first type access: it was one of the first major optimizations I did. The array access lines marked with #*** are the last major bottleneck in a set of code that I would really prefer run quickly, because the code runs the function with this lines around O(10^7) times, and even minor speedups will impact my run time substantially. I'm like 99% sure that this is a memory access type slow down, but I don't see an easy way to make it faster. So how can I make this memory access faster?

0

There are 0 best solutions below