Say there's an array of 1024 bits that are all zeros:
example: [0,0,0,0,0,0,0,...]
Then I overwrite 20 zeros with ones at completely random positions:
example: [0,1,0,0,0,0,0,...]
What is the theoretical minimum number of bits needed to encode the location of these 20 randomly placed bits, assuming that I had a perfect encoder?
I know there are communication theory equations that will tell me this, but I want to double check my calculations.
Harder bonus question: Show me the code for an algorithm that implements an encoding that approaches this minimum limit.
Bonus bonus: What if the bit flips where byte level instead of bit level? e.g. entire bytes flipped. Same result?
If you treated a string of 200 bits as an array of twenty 10 bit numbers, each listing the position of one of the one-bits, you'd be saving 824 bits.
But I don't think this is the minimum. For example, if you treat each of the numbers as relative to the previous item instead of an absolute position, some analysis may show that on average you'd only need, say, 8 bits to encode the distance to the next one-bit. So add a bit to the front: when 0, then 200 bits follow with absolute positions. When 1, then 160 bits follow with relative positions. This should yield a lower average number of bits to encode the full value.
Generalizing, this is simply data compression. There are probably many compression algorithms that could reduce the average number of bits required to encode your "twenty one-bits in 1024" to a very small number. Computing a suitable binary tree, storing a representation of it, then storing the bits required to traverse the tree would likely yield a very efficient algorithm (this is in fact the basis of modern data compression).