double precision and single precision floating point numbers?

7.4k Views Asked by At

I was wondering why double precision and single precision numbers are sometimes equal and sometimes not. For example when I have the following they are not equal:

import numpy as np

x=np.float64(1./3.)
y=np.float32(1./3.)

but the following are equal:

x=np.float64(3.)
y=np.float32(3.)

I understand why the first set of x and y is not equal but I am not quite sure as to why the second set is equal.

2

There are 2 best solutions below

2
Patricia Shanahan On BEST ANSWER

This answer assumes single is IEEE 754 32 bit binary floating point, and double is the corresponding 64 bit type.

Any value that can be represented exactly in a single can also be represented exactly as a double. That is the case for 3.0. The closest single and the closest double both have value exactly 3, and are equal.

If a number cannot be represented exactly in a single, the double is likely to be a closer approximation and different from the single. That is the case for 1.0/3.0. The closest single is 0.3333333432674407958984375. The closest double is 0.333333333333333314829616256247390992939472198486328125.

Both single and double are binary floating point. A number cannot be expressed exactly unless it is equal a fraction of the form A/(2**B), where A is an integer, B is a natural number, and "**" represents exponent. Numbers such as 0.1 and 0.2 that are terminating decimal fractions but not terminating binary fractions behave like 1/3.0. For example, the closest single to 0.1 is 0.100000001490116119384765625, the closest double is 0.1000000000000000055511151231257827021181583404541015625

0
aka.nice On

Imagine you have to represent 1/3 in base 10 with a limited number of digits.

With 2 digits (let's call this single precision), it will be 0.33
With 4 digits (double precision) it will be 0.3333
So the two approximations are not equal.

Now transpose this to representing 1/5 in base 2. You also need an infinite number of bits (binary digits) - it's 0.001100110011....

With 24bits significand (IEEE 754 single precision) and 53 bits significand (double precision), the two floating point approximation will be different.

Same for 1/3...

If the number can be represented exactly without approximation in single precision, then both representation will be equal.

That is a numerator fitting in less than 25 bits (without the trailing zeros), and a denominator being a power of 2. (but not too high an exponent both in numerator nor in denominator...).

for example 1/2 3/2 5/2 ... 1/4 3/4 5/4 etc... will have equal representation.

2^24+1 won't have same representation.
But 2^60 will.

There are other case when representation will be inexact but approximation will be the same:
2^54+1 will have same float and double approximation.
so will 1+2^-60 for example.