Ensure float to be smaller than exact value

523 Views Asked by At

I want to calculate a sum of the following form in C++

float result = float(x1)/y1+float(x2)/y2+....+float(xn)/yn

xi,yi are all integers. The result will be an approximation of the actual value. It is crucial that this approximation is smaller or equal to the actual value. I can assume that all my values are finite and positive. I tried using nextf(,0) as in this code snippet.

cout.precision( 15 );
float a = 1.0f / 3.0f * 10; //3 1/3
float b = 2.0f / 3.0f * 10; //6 2/3
float af = nextafterf( a , 0 );
float bf = nextafterf( b , 0 );
cout << a << endl;
cout << b << endl;
cout << af << endl;
cout << bf << endl;
float sumf = 0.0f;
for ( int i = 1; i <= 3; i++ )
{
    sumf = sumf + bf;
}
sumf = sumf + af;
cout << sumf << endl;

As one can see the correct solution would be 3*6,666... +3.333.. = 23,3333... But as output I get:

3.33333349227905
6.66666698455811
3.33333325386047
6.66666650772095
23.3333339691162

Even though my summands are smaller than what they should represent, their sum is not. In this case applying nextafterf to sumf will give me 23.3333320617676 which is smaller. But does this always work? Is it possible that the rounding error gets so big that nextafterf still leaves me above the correct value?

I know that I could avoid this by implementing a class for fractions and calculating everything exactly. But I'm curious whether it is possible to achieve my goal with floats.

3

There are 3 best solutions below

3
Jerry Coffin On BEST ANSWER

My immediate reaction is that the approach you're taking is fundamentally flawed.

The problem is that with floating point numbers, the size of step that nextafter will take will depend on the magnitude of the numbers involved. Let's consider a somewhat extreme example:

#include <iostream>
#include <iomanip>
#include <cmath>

int main() { 
    float num = 1.0e-10f;
    float denom = 1.0e10f;

    std::cout << std::setprecision(7) << num - std::nextafterf(num, 0) << "\n";
    std::cout << std::setprecision(7) << denom - std::nextafterf(denom, 0) << "\n";
}

Result:

6.938894e-018
1024

So, since the numerator is a lot smaller than the denominator, the increment is also much smaller.

The result seems fairly clear: instead of the result being slightly smaller than the input, the result should be quite a bit larger than the input.

If you want to ensure the result is smaller than the correct number, the obvious choice would be to round the numerator down, but the denominator up (i.e. nextafterf(denom, positive_infinity). This way, you get a smaller numerator and a larger denominator, so the result is always smaller than the un-modified version would have been.

8
Support Ukraine On

Try changing the float rounding mode to FE_TOWARDZERO.

See code example here:

Change floating point rounding mode

0
chux - Reinstate Monica On

float result = float(x1)/y1+float(x2)/y2+....+float(xn)/yn has 3 places where rounding may occur.

  1. Conversion of int to float - it is not always exact.
  2. Division floating point x/floating point y
  3. Addition: floating point quotient + floating point quotient.

By using the next, (either up or down per the equation needs), the results will certainly be less than the exact mathematical value. This approach may not generate the float closest to the exact answer, yet will be close and certainly smaller.

float foo(const int *x, const int *y, size_t n) {
  float sum = 0.0;
  for (size_t i=0; i<n; i++) {  // assume x[0] is x1, x[1] is x2 ...
    float fx = nextafterf(x[i], 0.0);
    float fy = nextafterf(y[i], FLT_MAX);
    // divide by slightly smaller over slightly larger
    float q = nextafterf(fx / fy, 0.0);
    sum = nextafterf(sum + q, 0.0);
  }
  return sum;
}