Are there non-negative floating points a,b,c such that a + (b - a) * c > b with a < b and c < 1?

Question

Are there non-negative floating points a,b,c such that a + (b - a) * c > b with a < b and c < 1?

74 Views Asked by Ben At 06 March 2024 at 10:06

This question is related to When does (b - a) + a != b exactly in floating point but with an additional multiplicator c that must be strictly smaller than 1.

Formally I am asking myself if there exist any floating point numbers a, b, and c such that

0 <= a < b
0 <= c < 1
a + (b - a) * c > b.

If there exists such a triple I am further wondering if we can find normal floating point numbers or if some of the three need to be subnormal.

First of all to find such a triple it is pretty clear that if it works it will work with c = 1. - epsilon, i.e. the largest floating point smaller than 1. So to find suiting values for a and b I tried multiple things without success:

Generate a lot of random values
Generate pairs a = i / n and b = j / n with 0 <= i < j <= n as suggested in this answer: https://stackoverflow.com/a/51383057/23137796. I tried for all (i,j,n) pairs with n <= 1000. I could find many pairs with a + (b - a) > b but none with a + (b - a) * c > b.
I tried to generate a and b values that are close to each other (i.e. only epsilon away from each other or some constant multiple of epsilon away, up to 100). I couldn't find any pair that satisfies a + (b - a) > b.

Edit: Thank you for everyone presenting examples with rounding mode towards infinity, I can see that with this rounding mode we can find examples. I am mostly interested in rounding mode "Round to nearest, ties to even" that is used in many programming languages by default, should have mentioned this to begin with.

Original Q&A

There are 1 best solutions below

**Eric Postpischil** · Answer 1 · 2024-03-06T14:16:36.447000

Summary

There are such numbers when rounding toward +∞ is used. There are not such numbers when rounding to nearest with ties to even is used in a binary floating-point format. There are such numbers in non-binary formats.

In this answer, text in code font represents floating-point arithmetic. Text not in code font represents real-number arithmetic.

Rounding Toward +∞

Consider IEEE-754 binary32 with rounding toward +∞. Let a be ½−2⁻²⁵, the representable number just below ½. Let b be 1. Let c be 1−2⁻²⁴, the representable number just below 1. b−a (mathematical subtraction) is ½+2⁻²⁵. That is not representable, so b-a (floating-point subtraction) with rounding toward +∞ produces ½+2⁻²⁴. b-a•c is ½+2⁻²⁴−2⁻²⁵-2⁻⁴⁸, so (b-a)*c produces ½+2⁻²⁴. Then a+(b-a)*c is ½−2⁻²⁵+½+2⁻²⁴ = 1−2²⁵+2⁻²⁴ = 1+2⁻²⁵. 1+2⁻²⁵ is not representable, so rounding toward +∞ produces 1+2⁻²³, so a + (b-a)*c produces 1+2⁻²³, which is greater than b.

Demonstrating in C, the output of:

#include <fenv.h>
#include <stdio.h>
#include <math.h>

#pragma STDC FENV_ACCESS ON


int main(void)
{
    fesetround(FE_UPWARD);
    float a = nexttowardf(.5f, 0);
    float b = 1;
    float c = nexttowardf(1, 0);
    printf("%a\n", a + (b-a)*c);
}

is:

0x1.000002p+0

Round to Nearest With Binary Floating-Point

This is not possible in a binary format with round-to-nearest-ties-to-even. Proof:

Consider, that, if a + (b-a)*c is greater than b, it is at least one ULP of b greater than b. Let u be the ULP of b. Consider first a + (b-a). Since a < b, b-a ≤ b, so the ULP of b-a is at most the UP of b, u. With round-to-nearest, the maximum error in a single operation is ½ ULP, for the ULP of the result. There are values for a and b where b-a rounds up by ½ ULP, such as when we subtract 1½•2⁻²³ from 2−2⁻²³. (2−2⁻²³ − 1½•2⁻²³ = 2 − 2½•2⁻²³, which rounds up to 2 − 2•2⁻²³.)

So, for b-a, it is possible to produce a result of b − a + ½ u.

Then when we add a, there are again values for which the addition in a + (b-a) rounds up by ½ ULP. We know from above the mathematical result is at most a + b − a + ½u = b + ½u. This is halfway to the next representable number after b, so it will, with round-to-nearest-ties-to-even, round up if the next representable number has an even low digit in its significand. The numbers above are an example of that; with a = 1½•2⁻²³ and b = 2−2⁻²³, a + (b-a) yields 2, which is greater than b.

Observe that we just barely achieved this. The final addition rounded up because there was exactly ½u rounding error in b-a, the maximum possible. It is not possible to arrange for the result of a + (b-a) to be any greater than b + ½u. So, when we consider a + (b-a)*c, we must not lose any of this achieved rounding error. The result of (b-a)*c must be b-a and not any less, or a + (b-a)*c will be lower than the value needed to allow rounding up.

The greatest c can be is the next representable number less than 1. The above examples use binary32, but we will consider binary formats generally, so the greatest c is 1−2^−p, where p is the precision of the format (the number of digits in the significand).

b-a•c is necessarily less than b-a, so we want (b-a)*c to round up to compensate for that. Suppose b-a has the largest significand in a binary format, 2−2^1−p (using [1, 2) for the normal significand interval). c also has this significand. (It is 1−2^−p, which normalizes to 2−2^1−p.) Then product of the significands in b-a•c is (2−2^1−p)² = 4 − 2•2^1−p + (2^1−p)² = 4 − 2^2−p + 2^2−2p. Scaling that to the normal interval yields 2 − 2^1−p + 2^1−2p.

2 − 2^1−p is the largest significand, and its ULP is 2^1−p, so the product does not exceed the largest significand by at least ½ ULP unless 2^1−2p ≥ ½•2^1−p, implying p ≤ 1. (I will leave it to the reader to consider whether there is a solution with p = 1 and what to replace round-to-nearest-ties-to-even with when all significands of nonzero numbers have an odd low digit.)

Therefore the product of this largest possible significand of a-b with c cannot round up, and no lower significand can round up either. Therefore, we cannot achieve the necessary rounding error to make a + (b-a)*c exceed b.

Bonus

For the p = 1 case left to the reader above, the representable numbers are … ¼, ½, 1, 2, 4,… The largest possible value for c is ½. We can consider any representable number for b, as there is only one significand (1) for non-zero numbers, and all behaviors will scale as the exponent is changed. Let’s choose b = 4, which makes a at most 2. For b-a, the result for any a is at most b, so let’s replace a + (b-a)*c with a + b*c. This is a + 4*½, so a + 2. Since a ≤ 2, a + 2 ≤ 4. So there is no solution in this degenerate floating-point format.

Decimal Format

Consider a two-digit decimal format with round-to-nearest. Let a be 15, b be 130, and c be .99. Then b-a = 115, so b-a rounds to 120, and b-a•c = 118.8, so (b-a)*c rounds to 120. Then a + (b-a)*c = 15 + 120 = 135, so a + (b-a)*c rounds to 140, and 140 is greater than b.

Are there non-negative floating points a,b,c such that a + (b - a) * c > b with a < b and c < 1?

There are 1 best solutions below

Summary

Rounding Toward +∞

Round to Nearest With Binary Floating-Point

Bonus

Decimal Format

Related Questions in FLOATING-POINT

Related Questions in IEEE-754

Trending Questions

Popular # Hahtags

Popular Questions