I'm trying to write my own arithmetic compressor. I'm cheating a little bit by referencing the implementation I found in https://marknelson.us/posts/2014/10/19/data-compression-with-arithmetic-coding.html . Specifically, this part:
(Note that this code is incomplete and the author later updates this code to be complete, but for the purpose of my question, this part will suffice.)
unsigned int high = 0xFFFFFFFFU;
unsigned int low = 0;
char c;
while ( input >> c ) {
int range = high - low + 1;
prob p = model.getProbability(c);
high = low + (range * p.upper)/p.denominator;
low = low + (range * p.lower)/p.denominator;
for ( ; ; ) {
if ( high < 0x80000000U )
output_bit( 0 );
else if ( low >= 0x80000000U )
output_bit( 1 );
else
break;
low <<= 1;
high << = 1;
high |= 1;
}
}
Below this code, the author has the following description:
We shift in a 1 into the least significant bit of high, and a 0 into the least significant of low. Thus, we keep working on 32 bits of precision by expelling the bits that no longer contribute anything to the precision of our calculations. In this particular implementation, we just keep 32 bits in working registers, with some additional number already sent to the output, and some other number pending for input.
I have several questions about this. First, I don't quite understand why we shift a 1 as opposed to a 0 into high. Is it because we want to maximize the value of high? Second, the author seems to claim that 0 was shifted into the least significant bit of low, but I don't see where this is done explicitly and if it's not explicitly done, I don't see how the above algorithm guarantees that the least significant bit of low is 0?
lowandhigh.<<= 1always shifts a zero into the low bit.