Multiplying floats "manually" with integer operations

1.4k Views Asked by At

I'm trying to implement a floating point multiply without using FP hardware instructions.

I think my code works for the sign bit and exponents bits, but not the mantissa.

The general idea:
1. Add exponents of those two numbers.
2. Multiply their mantissas.
3. Normalize the mantissa.
4. Add to exponent the part got from normalizing mantissa.
I ignore the sign bit for now since I test it on values higher than 0.

And here is the problem: I tried to multiply those two mantissas and then - since the result would be in two registers edx:eax - shifting bits one by one from edx to eax meanwhile increasing exponent. But it doesn't seem to work, so I wonder if my idea is good, or maybe there is some better way to do it?


Here is what I have already written in MASM:

mov eax, [ebp+8] ;put into eax one of numbers to multiply
mov ecx, a ;in ecx is second number to multiply, constant = 1.8

and ecx, 7F800000H ;mask to get exponent
and eax, 7F800000H

shr ecx, 23
shr eax, 23

sub ecx, 127
sub eax, 127

add ecx, eax ;exponent of the final number - later should be added part got from mantissa

mov eax, [ebp+8]
mov edx, a
and eax, 007FFFFFH ;getting mantissa
and edx, 007FFFFFH

; editor's note: unsure if there were any unlisted instructions
; between the two code in the original

mul edx    ; multiply the mantissas

mov ebx, 0

spr:
    cmp edx, 0 ;check if edx is cleared out
    jne przesun
    je dalej

przesun:
    inc ecx
    shr eax, 1 ;making space for new bit
    shr edx, 1 ;put bit to CF
    bts eax, 31 ;putting bit from CF   ; Bug #1, see Michael's answer
    jmp spr

dalej:
    shr eax, 7
    shl ecx, 23
    add eax, ecx ;result of multiplying

The result is 0 for every number I tried multiplying with 1.8.
(atm I test it on number 15, so the result should be 27)

2

There are 2 best solutions below

2
Michael On
bts eax, 31 ;putting bit from CF

^ BTS doesn't do what you seem to think it does.

Quoting from Intel's manual (emphasis added):

Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and sets the selected bit in the bit string to 1. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value.

So you're always setting the bit to 1, regardless of the value of the bit you just shifted out.

There are other instructions that you can use to accomplish what you're trying to do:

shrd eax, edx, 1  ; Shift eax 1 bit to the right, with the new MSB shifted in from edx
shr edx,1         ; The shrd above doesn't modify edx, so discard the old LSB of edx 

or:

shr edx, 1   ; CF = edx.0
rcr eax, 1   ; rotate through carry; shift in CF from the left and shift out eax.0
1
Peter Cordes On

Your algo sounds reasonable. This floating point converter might be useful quickly see what the bit pattern should be for any given number.

Since your wrong-answer is zero, your remaining bug might not be in your code exactly, but in how you're getting the result back to the rest of your program. Try with bigger numbers, or set eax to non-zero manually in the debugger.

asm style: Your loop that shifts bits from one reg to another is poorly implemented. (besides the fact it's unneeded, see below). Instead of an unconditional jmp back to a test at the top, you should test&branch at the start to skip the loop if needed, and then put another test&branch at the bottom of the loop to repeat only the instructions that need to be in the loop.

; mov ebx, 0    ; was this supposed to be ecx?
                ; ebx doesn't show up anywhere else in your code
xor  ebx, ebx

spr:
    ; cmp edx, 0 ;check if edx is cleared out
    test edx, edx  ; shorter encoding when testing for 0
    jz dalej     ; jz and je are the same instruction
    ; else fall through into the loop.  Your old version used two branches here >.<

przesun:
    inc ecx
    shr eax, 1 ;making space for new bit
    shr edx, 1 ;put bit to CF
    ; bts eax, 31 ;bug, but Michael's answer covered that
    test edx,edx
    jnz  przesun
dalej:

Yes, it's better to repeat the test&branch if that reduces jumps. It may also improve CPU branch prediction performance if some inputs skip the loop, but when they don't they have the same number of iterations.

test/jcc is about the same cost as jcc alone, but takes more space.

You might be able to save an instruction by taking advantage of the fact that shr sets the Zero flag based on the result. But prob not in this case, as you need to put the bit into eax, which will set flags.


When you combine the exponent and mantissa, it would make more sense to use an or instruction, instead of add. It's not going to make the code smaller or faster, but the usual way to combine different parts of a bit-field is with or. You don't need or want carry between bits (which won't actually happen because one value is zero everywhere the other can have ones).

shr eax, 7    ; mantissa
shl ecx, 23   ; exponent
    ;; add eax, ecx ;result of multiplying
or  eax, ecx  ; combined result

Actually, this might be another case where you could use shrd instead of two shifts and an or.

Or you could work with exponents in their "proper" position, and leave the low 23bits all zero when you add exponents. Instead of inc, you'd add 1<<23. (or shiftcount << 23 without the loop). You still need to get the mantissa's signs to/from the sign bit.

xor might be useful for handling the sign bit. a ^ b has the same sign bit as a * b.


Of course, in this case you shouldn't use a loop at all. Like I commented on Michael's answer, you should use 32-lzcnt to count how many bits there are, then do it with one shrd. You can zero the source reg after with xor edx, edx, if you want. (bsr+1 after testing for non-zero, instead of 32-lzcnt, is an alternative if you want your code to run on CPUs without lzcnt)

This should still work for denormal results. The upper 32 is zero, and the lower 32 has leading zeros. However, if your exponent is already at the minimum, I guess there's nothing you can do but leave it denormal.

xoring a register with itself the the canonical idiom for zeroing. It takes fewer instruction bytes than mov edx, 0, and is just as fast. (CPUs recognize it as not depending on the previous value of the register, so it doesn't delay out-of-order execution).