I'm trying to write a binary 8 bit floating point addition algorithm for a picoblaze microcontroller (1 sign bit, 4 exponent bits, and 3 mantissa bits)
I got it to work with positive numbers but I can't figure out how to do it when there are negative numbers too.
My main problem is setting the sign bit of the result, can someone explain how to set it correctly?
My idea was to check the sign of both numbers; then if they're both positive set the sign to 0, if they're both negative set the sign to 1 and use the same methods as before for the addition, and if one is negative and one is positive compare the numbers and use the sign bit of the larger one, but I'm not sure how to compare the two numbers and the code is getting a little cluttered, is there a better way to do it?
You're in luck. Assuming, you're using IEEE754 like representation (i.e., exponent is stored with appropriate bias), you can simply compare the bit strings lexicographically after a bit of massaging. Note that this assumes you already handled NaN values appropriately, since NaN's should simply propagate through your adder.
The trick is this:
Now, you can compare these two bit-strings lexicographically, the one that comes earlier in the dictionary order is smaller. You might have to carefully arrange how you process
-0, but I suspect that's not really a big issue for you.In fact, this is precisely the reason why exponents are stored with bias, so that you can compare floats by simply treating them as unsigned numbers, after doing the bit-flip trick I mentioned above.