C - Why #pragma pack(1) Consider 6-bit struct member as an 8-bit?

1.5k Views Asked by At

I got stuck about #pragma pack(1) wrong behavior when define a 6-bit field and assumes it as 8-bit. I read this question to solving my problem but it doesn't help me at all.

In Visual Studio 2012 I defined bellow struct for saving Base64 characters :

#pragma pack(1)
struct BASE64 {
    CHAR    cChar1 : 6;
    CHAR    cChar2 : 6;
    CHAR    cChar3 : 6;
    CHAR    cChar4 : 6;
};

Now I got its size with sizeof, but the result isn't what I expected :

printf("%d", sizeof(BASE64));      // should print 3

Result : 4

I was expect that get 3 (because 6 * 4 = 24, so 24 bit is 3 byte)

Event I tested it with 1-bit field instead and got correct size (1-byte) :

#pragma pack(1)
struct BASE64 {
    CHAR    cChar1 : 2;
    CHAR    cChar2 : 2;
    CHAR    cChar3 : 2;
    CHAR    cChar4 : 2;
};

Actually, why 6-bit assumes 8-bit with #pragma pack(1)?

3

There are 3 best solutions below

1
On BEST ANSWER

#pragma pack generally packs on byte boundaries, not bit boundaries. It's to prevent the insertion of padding bytes between fields that you want to keep compressed. From Microsoft's documentation (since you provided the winapi tag, and with my emphasis):

n (optional) : Specifies the value, in bytes, to be used for packing.

How an implementation treats bit fields when you try to get them to cross a byte boundary is implementation defined. From the C11 standard (secion 6.7.2.1 Structure and union specifiers /11, again my emphasis):

An implementation may allocate any addressable storage unit large enough to hold a bitfield. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

More of the MS documentation calls out this specific behaviour:

Adjacent bit fields are packed into the same 1-, 2-, or 4-byte allocation unit if the integral types are the same size and if the next bit field fits into the current allocation unit without crossing the boundary imposed by the common alignment requirements of the bit fields.

2
On

In some implementations, bit fields cannot span across variable boundaries. You can define multiple bit fields within a variable only if their total number of bits fits within the data type of that variable.

In your first example, there are not enough available bits in a CHAR to hold both cChar1 and cChar2 when they are 6 bits each, so cChar2 has to go in the next CHAR in memory. Same with cChar3 and cChar4. Thus why the total size of BASE64 is 4 bytes, not 3 bytes:

  (6 bits + 2 bits padding) = 8 bits
+ (6 bits + 2 bits padding) = 8 bits
+ (6 bits + 2 bits padding) = 8 bits
+ 6 bits
- - - - - - - - - - 
= 30 bits
= needs 4 bytes

In your second example, there are enough available bits in a CHAR to hold all of cChar1...cChar4 when they are 1 bit each. Thus why the total size of BASE64 is 1 byte, not 4 bytes:

  1 bit
+ 1 bit
+ 1 bit
+ 1 bit
- - - - - - - - - - 
= 4 bits
= needs 1 byte
1
On

The simple answer is: this is NOT wrong behavior.

Packing tries to put separate chunks of data in bytes, but it can't pack two 6-bit chunks in one 8-bit byte. So the compiler puts them in separate bytes, probably because accessing a single byte for retrieving or storing your 6-bit data is easier than accessing two consecutive bytes and handling some trailing part of one byte and some leading part from another one.

This is implementation defined, and you can do little about that. Probably there is an option for an optimizer to prefer size over speed – maybe you can use it to achieve what you expected, but I doubt the optimizer would go that far. Anyway the size optimization usually shortens the code, not data (as far as I know, but I am not an expert and I may well be wrong here).