Cache-friendly sqare matrix transposition logic issue

67 Views Asked by At

This code is a transposition algorithm specifically optimized to work on square matrices with rows and columns that are multiples of 8. The optimization involves processing components later that satisfy certain conditions, which are commented out in the code. However, the transpose fails in the wrong place.

void block8x4_opt(int M, int N, int A[N][M], int B[M][N])
{
    register int i, j, i1, j1;
    for (i = 0; i < N; i += 8)
        for (j = 0; j < M; j += 4) {
            if (j == i || j == i + 4) { /* same block */
                for (i1 = i; i1 < i + 8 ; i1++)
                    for (j1 = j; j1 < j + 4 ; j1++) {
                        if (i1 == j1 || i1 == j1 + 4) /* same row */
                            continue;
                        B[j1][i1] = A[i1][j1];
                    }
                for (j1 = j; j1 < j + 4; j1++) {
                    B[j1][j1] = A[j1][j1]; /* i1 == j1 */
                    B[j1][j1+4] = A[j1+4][j1]; /* i1 == j1 + 4 */
                }
                continue;   
            }
            for (i1 = i; i1 < i + 8 ; i1++)
                for (j1 = j; j1 < j + 4 ; j1++)
                    B[j1][i1] = A[i1][j1];
        }
}

Below is the result of transposing 8x8 matrix A to B using the code above.

the problem occurs at (0,5),(1,6),(2,7).

The weird thing is, the part I skipped with the above algorithm is the top of that part, the line 4-13-22-31. It's the line just below it that causes the problem, and the other blocks don't cause any problems.

*************** matrix A *****************
         0    1    2    3    4    5    6    7
        ...  ...  ...  ...  ...  ...  ...  ...
  0:     0    1    2    3    4    5    6    7
  1:     8    9   10   11   12   13   14   15
  2:    16   17   18   19   20   21   22   23
  3:    24   25   26   27   28   29   30   31
  4:    32   33   34   35   36   37   38   39
  5:    40   41   42   43   44   45   46   47
  6:    48   49   50   51   52   53   54   55
  7:    56   57   58   59   60   61   62   63

              **** matrix B ****
         0    1    2    3    4    5    6    7
        ...  ...  ...  ...  ...  ...  ...  ...
  0:     0    8   16   24   32   40   48   56
  1:     1    9   17   25   33   41   49   57
  2:     2   10   18   26   34   42   50   58
  3:     3   11   19   27   35   43   51   59
  4:     4   12   20   28   36   44   52   60
  5:     ?   13   21   29   37   45   53   61
  6:     6    0   22   30   38   46   54   62
  7:     7   15    ?   31   39   47   55   63

I would really appreciate it if you could find the problem with my code.

1

There are 1 best solutions below

0
Support Ukraine On BEST ANSWER

Your algorithm is wrong.

Look at this:

for (i = 0; i < N; i += 8)
    for (j = 0; j < M; j += 4) {
        if (j == i || j == i + 4) { /* same block */

For 8x8 matrix both Nand M is 8 so it's like:

for (i = 0; i < 8; i += 8)
    for (j = 0; j < 8; j += 4) {
        if (j == i || j == i + 4) { /* same block */

So which values will i and jtake:

i==0, j==0
i==0, j==4
// Now the inner loop ends as j becomes 8 due j+=4
// Now the outer loop ends as i becomes 8 due i+=8

So the if statement will only be executed for (i,j) as (0,0) and (0,4). If both cases the if condition will be true. As the last statement in the body of the if is a continue, the code will never reach:

        for (i1 = i; i1 < i + 8 ; i1++)
            for (j1 = j; j1 < j + 4 ; j1++)
                B[j1][i1] = A[i1][j1];

Another issue... When j is 4 consider this:

            for (j1 = j; j1 < j + 4; j1++) {
                B[j1][j1] = A[j1][j1]; /* i1 == j1 */
                B[j1][j1+4] = A[j1+4][j1]; /* i1 == j1 + 4 */
            }

So j1 will take the value 4, 5, 6, 7 and you have

                B[j1][j1+4] = A[j1+4][j1]; /* i1 == j1 + 4 */
                      ^^^^
                      ups... out of bound, i.e. 8, 9, 10, 11

A simple advice. Add some printf whenever you write to b. Like:

B[j1][j1+4] = A[j1+4][j1];
printf("Line %d: B[%d][%d] = A[%d][%d]\n", __LINE__, j1, j1+4, j1+4, j1)";

Then it's easy to check all assignments to B for correctness.