A simple floating-point addition x+y in with precision 4 (i.e. IEEE mantissa width 3), with 3 bits for exponent (emax=3, emin=-4) for x = mpfr('0.75'), y = mpfr('0.03125') incorrectly gives mpfr('0.75') as result when it should be mpfr('0.8125'). Note that 0.3125 is a subnormal number for this reduced precision format.
Edit: Terminal interaction extracted from link and included for future reference.
>>> "{0:.10Df}".format(mpfr('0.75')+mpfr('0.03125'))
'0.7500000000'
>>> get_context()
context(precision=4, real_prec=Default, imag_prec=Default,
round=RoundToNearest, real_round=Default, imag_round=Default,
emax=3, emin=-4,
subnormalize=True,
trap_underflow=False, underflow=False,
trap_overflow=False, overflow=False,
trap_inexact=False, inexact=True,
trap_invalid=False, invalid=False,
trap_erange=False, erange=False,
trap_divzero=False, divzero=False,
trap_expbound=False,
allow_complex=False)
>>>
Disclaimer: I maintain gmpy2.
I believe it is a bug with creating subnormals from a string. I think it is fixed in the development code but I won't be able to test until later. I'll update this answer later.
Update
The problem is not related to creating a subnormal from a string. In this case, the subnormal value is created properly. In gmpy2 2.0.x, there is a rare bug when converted a string to a subnormal. The simplest work-around is to convert the input to an
mpqtype first; i.e.mpfr(mpq('0.03125')).The actual problem is the default rounding mode. The intermediate sum is exactly halfway between two 4 bit values. The default rounding mode of
RoundToNearestselects the rounded value with final bit of 0. If you change the rounding mode toRoundUp, you get the expected result.One last comment: the values of
precision,emaxandeminare slight different between the IEEE standards and the MPFR library. Ifeis the exponent size andpis the precision (in IEEE terms), thenprecisionshould bep+1,emaxshould be2**(e-1)andeminshould be4-emax-precision. This doesn't impact your question since it only changesemax.