(sorry, I'm coming up with some funny ideas... bear with me...)
Let's say I have a 'double' value, consisting of:
implicit
sign exponent bit mantissa
0 10000001001 (1).0011010010101010000001000001100010010011011101001100
representing 1234.6565 if I'm right.
I'd like to be able to access the fields of sign, exponent, implicit and mantissa separately as bits!, and manipulate them with bitwise operations like AND, OR, XOR ... or string operations like 'left', mid etc.
And then I'd like to puzzle a new double together from the manipulated bits.
e.g. setting the sign bit to 1 would make the number negative, adding or subtracting 1 to/from the exponent would double/halve the value, stripping all bits behind the position indicated by the recalculated (unbiased) value of the exponent would convert the value to an integer and so on.
Other tasks would/could be to find the last set bit, calculate how much it contributes to the value, check if the last bit is '1' (binary 'odd') or '0' (binary 'even') and the like.
I have seen similar in programs, just can't find it on the fly. I may remember something with 'reinterpret cast' or similar? I think there are libraries or toolkits or 'howtos' around which offer access to such, and hope here are reading people who can point me to such.
I'd like a solution near simple processor instructions and simple C code. I'm working in Debian Linux and compiling with gcc which was in by default.
startpoint is any double value i can address as 'x',
startpoint 2 is I'm not! an experienced programmer :-(
How to do easy, and get it working with good performance?
This is straightforward, if a bit esoteric.
Step 1 is to access the individual bits of a
floatordouble. There are a number of ways of doing this, but the commonest are to use achar *pointer, or a union. For our purposes today let's use a union. [There are subtleties to this choice, which I'll address in a footnote.]So now
x.bitslets us access the bits and bytes of ourdoublevalue as a 64-bit unsigned integer. First, we could print them out:This prints
and we're on our way.
The rest is "simple" bit manipulation. We'll start by doing it the brute-force, obvious way:
This prints
and these values exactly match the bit decomposition you showed in your question, so it looks like you were right about the number 1234.6565.
What we have so far are the raw exponent and mantissa values. As you know, the exponent is offset, and the mantissa has an implicit leading "1", so let's take care of those:
(Actually this isn't quite right. Soon enough we're going to have to address some additional complications having to do with denormalized numbers, and infinities and NaNs.)
Now that we have the true mantissa and exponent, we can do some math to recombine them, to see if everything is working:
But if you try that, it gives the wrong answer, and it's because of a subtlety that, for me, is always the hardest part of this stuff: Where is the decimal point in the mantissa, really? (Actually, it's not a "decimal point", anyway, because we're not working in decimal. Formally it's a "radix point", but that sounds too stuffy, so I'm going to keep using "decimal point", even though it's wrong. Apologies to any pedants whom this rubs the wrong way.)
When we did
mantissa * pow(2, exponent)we assumed a decimal point, in effect, at the right end of the mantissa, but really, it's supposed to be 52 bits to the left of that (where that number 52 is, of course, the number of explicit mantissa bits). That is, our hexadecimal mantissa0x134aa04189374c(with the leading 1 bit restored) is actually supposed to be treated more like0x1.34aa04189374c. We can fix this by adjusting the exponent, subtracting 52:So now
checkis 1234.6565 (plus or minus some roundoff error). And that's the same number we started with, so it looks like our extraction was correct in all respects.But we have some unfinished business, because for a fully general solution, we have to handle "subnormal" (also known as "denormalized") numbers, and the special representations
infandNaN.These wrinkles are controlled by the exponent field. If the exponent (before subtracting the bias) is exactly 0, this indicates a subnormal number, that is, one whose mantissa is not in the normal range of (decimal) 1.00000 to 1.99999. A subnormal number does not have the implicit leading "1" bit, and the mantissa ends up being in the range from 0.00000 to 0.99999. (This also ends up being the way the ordinary number 0.0 has to be represented, since it obviously can't have that implicit leading "1" bit!)
On the other hand, if the exponent field has its maximum value (that is, 2047, or 211-1, for a double) this indicates a special marker. In that case, if the mantissa is 0, we have an infinity, with the sign bit distinguishing between positive and negative infinity. Or, if the exponent is max and the mantissa is not 0, we have a "not a number" marker, or
NaN. The specific nonzero value in the mantissa can be used to distinguish between different kinds ofNaN, like "quiet" and "signaling" ones, although it turns out the particular values that might be used for this aren't standard, so we'll ignore that little detail.(If you're not familiar with infinities and NaNs, they're what IEEE-754 says that certain operations are supposed to return when the proper mathematical result is, well, not an ordinary number. For example,
sqrt(-1.0)returnsNaN, and1./0.typically givesinf. There's a whole set of IEEE-754 rules about infinities and NaNs, such as thatatan(inf)returns π/2.)The bottom line is that instead of just blindly tacking on the implicit 1 bit, we have to check the exponent value first, and do things slightly differently depending on whether the exponent has its maximum value (indicating specials), an in-between value (indicating ordinary numbers), or 0 (indicating subnormal numbers):
That last adjustment, adding 1 to the exponent for subnormal numbers, reflects the fact that subnormals are "interpreted with the value of the smallest allowed exponent, which is one greater" (per the Wikipedia article on subnormal numbers).
I said this was all "straightforward, if a bit esoteric", but as you can see, while extracting the raw mantissa and exponent values is indeed pretty straightforward, interpreting what they actually mean can be a challenge!
If you already have raw exponent and mantissa numbers, going back in the other direction — that is, constructing a
doublevalue from them — is just about as straightforward:This answer is getting too long, so for now I'm not going to delve into the question of how to construct appropriate exponent and mantissa numbers from scratch for an arbitrary real number. (Me, I usually do the equivalent of
x.d = atof(the number I care about), and then use the techniques we've been discussing so far.)Your original question was about "bitwise splitting", which is what we've been discussing. But it's worth noting that there's a much more portable way to do all this, if you don't want to muck around with raw bits, and if you don't want/need to assume that your machine uses IEEE-754. If you just want to split a floating-point number into a mantissa and an exponent, you can use the standard library
frexpfunction:This prints
and that looks right, because 0.602859619140625 × 211 = 1234.6565 (approximately). (How does it compare to our bitwise decomposition? Well, our mantissa was
0x34aa04189374c, or0x1.34aa04189374c, which in decimal is 1.20571923828125, which is twice the mantissa thatldexpjust gave us. But our exponent was 1033 - 1023 = 10, which is one less, so it comes out in the wash: 1.20571923828125 × 210 = 0.602859619140625 × 211 = 1234.6565.)There's also a function
ldexpthat goes in the other direction:This prints
1234.656500again.Footnote: When you're trying to access the raw bits of something, as of course we've been doing here, there are some lurking portability and correctness questions having to do with something called strict aliasing. Strictly speaking, and depending on who you ask, you may need to use an array of
unsigned charas the other part of your union, notuint64_tas I've been doing here. And there are those who say that you can't portably use a union at all, that you have to usememcpyto copy the bytes into a completely separate data structure, although I think they're taking about C++, not C.