how to convert different length of bits into byte array?

109 Views Asked by At

I am currently working on a project where I have to implement Elias Gamma coding for an Embedded system (STM32F405 programming using C). The main priority is efficiency and speed of the code thus I implenmented a lookup table for the Elias Gamma coding which contains bits values and number of bits for each "bit array" (I am using uint32_t for symbols and uint8_t for lengths). However I am having trouble converting these bits to byte array. Below is an example of what I am trying to do.

For three symbols -> 0b0001001 (bits length 7), 0b011 (bits length 3), b000010001 (bits length 9) I have to get 0b0001001011000010001 thus 0x12 0xC2 0x20 as a byte array to transfer the data through a communication channel like USART.

What confuses me is, bit symbols being different length and "append" process that has to be applied to the bits in a platform that uses bytes as smallest data structure.

Is there a simple way to do this?

Edit: For clarity I am trying to append 4096 "bit array" to X number of bytes. X number depends on the total number of bits appended (which is unknown while prossessing).

3

There are 3 best solutions below

9
ikegami On BEST ANSWER

Solution requires:

// Returns false if nothing left.
int get_next_bits( uint8_t *len, uint32_t *bits );

// Sends the first `output_len` bytes pointed by `output`.
send_buffer( uint16_t output_len, const uint8_t *output );

// Allocated memory or array large enough for output.
// See below if this can't be guaranteed.
uint8_t *buf;

Solution:

uint16_t output_len = 0;
uint8_t *output = buf;     // Assumed to be large enough. But see below.
uint8_t available = 0;     // Number of unassigned bits in `*output`.
uint32_t bits;             // Bits to add to output.
uint8_t unassigned;        // Number of bits (left) in `bits`.

while ( get_next_byte( &unassigned, &bits ) ) {
   while ( unassigned > 0 ) {
      if ( available == 0 ) {
         // We need another byte of output.
         // `output` already points to this byte.
         ++output_len;
         *output = 0;
         available = 8;
      }

      if ( available <= unassigned ) {
         // We get here, for example, when we have
         // 5 bits to insert but only 3 bits available in current output byte.
         // So we shift the unassigned bits by 2 to place into position.
         // And we're left with 0 available bits and 2 unassigned bits.
         // We can't forget to "remove" from `bits` the 3 bits we assigned.
         uint8_t shift = unassigned - available;
         *output |= bits >> shift;
         unassigned = shift;
         bits &= ~( ( (uint32_t)0xFFFFFFFF ) << unassigned );
         available = 0;
         ++output;
      } else {
         // We get here, for example, when we have
         // 3 bits to insert and 5 bits available in current output byte.
         // So we shift the unassigned bits by 2 to place into position.
         // And we're left with 2 available bits and 0 unassigned bits.
         uint8_t shift = available - unassigned;
         *output |= bits << shift;
         unassigned = 0;
         available = shift;
      }
   }
}

send_buffer( output_len, output );

The available == unassigned case can be handled specially if desired.

if ( available == unassigned ) {
   // We get here, for example, when we have
   // 5 bits to insert and 5 bits available in current output byte.
   // We're left with 0 available bits and 0 unassigned bits.
   *output |= bits;
   unassigned = 0;
   available = 0;
   ++output;
}

If you can't guarantee a buffer large enough to hold all the bits, replace

if ( available == 0 ) {
   // We need another byte of output.
   // `output` already points to this byte.
   *output = 0;
   available = 8;
   ++output_len;
}

with

if ( available == 0 ) {
   // We need another byte of output.
   // `output` already points to this byte.

   if ( output_len == buf_size ) {
      // Buffer is full, so send and empty it.
      output -= buf_size;
      output_len = 0;
      send_buffer( buf_size, output );
   }

   *output = 0;
   available = 8;
   ++output_len;
}
2
JohnFilleau On

You need a "system" that takes as input a series of bits and lengths and outputs a series of bytes.

The user will iterate over a series of input bits. When the "system" has enough input to generate bytes, the user can extract them.

I'm imagining an API that looks like this

typedef struct {
    // holds a user-supplied buffer pointer
    char* buf_ptr;
    int buf_size;

    // pointer to usable data within the user-supplied buffer
    char* front;
    int bit_size;
} BitToByteMachine;

// initializes the BitToByteMachine
// The user supplied buffer must be at least large enough to hold the largest
// single input that the user will supply
void bit_to_byte_machine_init(BitToByteMachine*, char* buffer, int buf_size);

// Takes a uint32_t bitfield and a uint8_t num_bits
// Adds the bits into the current machine
// Returns the number of FULL BYTES available for extraction
// Undefined behavior if the buffer is too full to hold the data provided
int bit_to_byte_machine_insert(BitToByteMachine*, uint32_t bitfield, uint8_t num_bits);

// Returns the number of FULL BYTES available
// This can be called when there are still more pending calls to
// bit_to_byte_machine_insert(), for example to start transmitting data
// on a serial line before you've filled up the internal buffer
unsigned char bit_to_byte_machine_full_bytes(BitToByteMachine*);

// Returns the number of FULL OR PARTIAL BYTES available
// This should only be used after all data has been inserted and the only
// thing left to do is finish transmitting the data
unsigned char bit_to_byte_machine_full_or_partial_bytes(BitToByteMachine*);

// Extracts the next available byte
// If the next byte is a FULL BYTE, then will return the FULL BYTE
// If the next byte is a PARTIAL BYTE, then
// it will be zero-padded in the least significant bits
unsigned char bit_to_byte_machine_pop_byte(BitToByteMachine*);

Regular user interaction would look something like this

unsigned char my_buffer[4];
BitToByteMachine machine;
bit_to_byte_machine_init(&machine, my_buffer, 4);

while(there_are_codes_to_insert) {
    uint32_t next_code = get_next_code();
    uint8_t next_code_bit_size = get_next_code_bit_size();

    if (bit_to_byte_machine_insert(next_code, next_code_bit_size)) {
        while(bit_to_byte_machine_full_bytes(&machine)) {
            USART_transmit_byte(bit_to_byte_machine_pop_byte(&machine));
        }
    }
}

// no more codes to insert
// send the last bytes with padding
while (bit_to_byte_machine_full_or_partial_bytes(&machine)) {
    USART_transmit_byte(bit_to_byte_machine_pop_byte(&machine));
}

How does this API work for you?

0
Ian Abbott On

Here is a function that will copy a bit-field value of specified width up to 32-bits into a destination byte string in a big endian fashion. The destination is specified as pointer to the first byte of the destination string and an offset in bits to the position where the most significant bit of the bit-field is to be stored in the destination byte string. The mapping of offsets to corresponding bits in bytes is as follows:

  • Offset 0 is bit 7 in byte 0.
  • Offset 1 is bit 6 in byte 0.
  • Offset 7 is bit 0 in byte 0.
  • Offset 8 is bit 7 in byte 1.

The function contains a while loop that copies up to 8 bits at a time from the source bit-field to the destination. In each iteration, the number of bits copied is the minimum of the number of remaining, uncopied bits in the source bit-field value, and the number of bits from the current destination offset (in bits) to the next multiple of 8 bits.

For convenience, the the function keeps the remaining bits of the source bit-field left shifted so that the most significant remaining bit is at bit position 31. The while loop normalizes the destination position by adding the destination bit position divided by 8 to the destination byte pointer and setting the destination bit position to the remainder.

#include <stdint.h>
#include <stddef.h>

/**
 * \brief Copy a bit-field into a string of bytes in big-endian order.
 *
 * \param dest Pointer to start of destination byte string.
 * \param dest_ofs_bit Offset in bits to location of most-significant bit
 *        of bit-field in destination byte string.
 * \param srcval Bit-field value as a number.
 * \param srcval_nbits Width of bit-field in bits.
 *
 * The destination byte string buffer is assumed to be at least
 * (\a dest_ofs_bit + \a srcval_nbits + 7) / 8 bytes long.
 */
void set_bitfield_in_bytes(uint8_t *dest, size_t dest_ofs_bit,
        uint32_t srcval, unsigned int srcval_nbits)
{
    /*
     * Move bit-field value to most significant end of srcval for convenience.
     */
    if (srcval_nbits < 32 && srcval_nbits > 0)
    {
        srcval <<= (32 - srcval_nbits);
    }
    while (srcval_nbits > 0)
    {
        unsigned int dest_shift;
        unsigned int nbits;
        uint8_t src_byte;
        uint8_t dest_mask;

        /* Adjust destination byte pointer and offset. */
        dest += dest_ofs_bit / 8;
        dest_ofs_bit %= 8;
        /* Extract up to 8 bits from srcval. */
        src_byte = srcval >> (24 + dest_ofs_bit);
        /* Assume all remaining bits in destination byte will be filled. */
        nbits = 8 - dest_ofs_bit;   /* Number of bits to fill. */
        dest_shift = 0;             /* No left shift. */
        dest_mask = 1u << nbits;    /* Bit-mask will be adjusted below. */
        if (nbits > srcval_nbits)
        {
            /* Not enough bits remaining in source bit-field, so adjust. */
            dest_shift = nbits - srcval_nbits;  /* Left shift. */
            nbits = srcval_nbits;               /* Number of bits to fill. */
        }
        dest_mask -= (1u << dest_shift);    /* Bit-mask of bits to fill. */
        /* Update bits in destination byte. */
        *dest = (*dest & ~dest_mask) | (src_byte & dest_mask);
        /* Shift remaining source bits to most-significant end. */
        srcval <<= nbits;
        /* Account for the number of bits copied in this iteration. */
        dest_ofs_bit += nbits;
        srcval_nbits -= nbits;
    }
}

Note that the function only modifies the specified bits of the destination byte string. If padding is required, the caller can either initialize the final byte of the destination with the padding bits before filling in the bit-fields that overwrite it, or can call the function with an extra bit-field to set the padding explicitly.

The following code tests the function with 4 consecutive bit-fields values including a padding bit-field. The bit-field values and their widths and offsets are:

  • value 9, width 7, offset 0
  • value 3, width 3, offset 7
  • value 17, width 9, offset 10
  • value 0 (padding), width 5, offset 19

The total length is 24 bits or 3 bytes.

#include <stdint.h>
#include <stdio.h>

void set_bitfield_in_bytes(uint8_t *dest, size_t dest_ofs_bit,
        uint32_t srcval, unsigned int srcval_nbits);

int main(void)
{
    uint8_t bytes[3] = { 0xff, 0xff, 0xff };

    set_bitfield_in_bytes(bytes, 0, 9, 7);
    set_bitfield_in_bytes(bytes, 7, 3, 3);
    set_bitfield_in_bytes(bytes, 7 + 3, 17, 9);
    set_bitfield_in_bytes(bytes, 7 + 3 + 9, 0, 5); /* padding */
    printf("%02x %02x %02x\n", bytes[0], bytes[1], bytes[2]);
}

Output:

12 c2 20

The function can still be used in a scenario where the destination buffer is a fixed size, the total length of the source bit-fields is longer than the buffer size in bits (buffer size in bytes multiplied by 8), and the destination buffer is flushed to some output device periodically. The caller just needs to be careful to neither overrun the buffer, nor send partially filled bytes to the output device. If the destination buffer is a cyclic buffer, it just needs 4 bytes (32 bits) of extra space past the end of the cyclic buffer to deal with bit-fields that write past the end of the buffer, and then it can move the excess bytes to the start of the buffer and adjust the destination offset.