Random mask don't work with shuffle intrinsic

59 Views Asked by At

I'm trying to generate a mask randomly (fill the array with values ​​from 0 to 15 first and then shuffle it) and then use it as an argument to the _mm_shuffle_epi8 instruction.

__m128i generate_shuffle_mask() {
    //create and fill array
    unsigned int* indices = (unsigned int*)malloc(16 * sizeof(unsigned int));
    for (int i = 0; i < 16; i++) {
        indices[i] = i;
    }
    //randomly swap elements
    srand(time(NULL)); 
    for (int i = 16 - 1; i > 0; i--) {
        int j = rand() % (i + 1); 
        int temp = indices[i];
        indices[i] = indices[j];
        indices[j] = temp;
    }
    //debug print
    for (int i = 0; i < 16; i++) {
        std::cout << indices[i] << " ";
    }
    std::cout << std::endl;
    //creating mask on array elements
    __m128i mask = _mm_set_epi8(
        indices[0], indices[1], indices[2], indices[3], indices[4], indices[5], indices[6], indices[7], indices[8], indices[9], indices[10], indices[11], indices[12], indices[13], indices[14], indices[15]

    );
    return mask;
    delete[] indices;

}

Then i try to use this mask in shuffle instruction:

__m128i mask = generate_shuffle_mask(); //generate mask
__m128i data = _mm_loadu_si128((__m128i*)str); //load bytes to data array
printf("Original bytes: ");
for (int i = 0; i < len; ++i) {
    printf("%02X ", ((unsigned char*)&data)[i]);
}
printf("\n");

data = _mm_shuffle_epi8(data, mask); //first shuffle
for (int i = 0; i < len; ++i) {
    printf("%02X ", ((unsigned char*)&data)[i]);
}
printf("\n");

__m128i data2 = _mm_shuffle_epi8(data, mask); //second shuffle
for (int i = 0; i < len; ++i) {
    printf("%02X ", ((unsigned char*)&data2)[i]);
}
printf("\n");

As far as I know, after the second shuffle operation I should get the bytes in the order they were originally in, however this does not happen. What could be the problem ?

I don't know why, but if i create mask as hardcode, it works correctly:

__m128i mask = __mm__set_epi8( 1, 0, 3, 2, 5, 4, 7, 6, 9, 8, 11, 10, 13, 12, 15, 14 );
1

There are 1 best solutions below

0
anatolyg On

"after the second shuffle operation I should get the bytes in the order they were originally in" Not necessary, your hard-coded value only swap adjacent element (has only cycle of 2), so it is true for your sample, try 1, 2, 3, 4, 5, .., 15, 0 and you would need 16 shuffle to get to original value.

– Jarod42