I try three methods to get sum of floats. one method is pandas sum, another one is use for loop, the last one is using math.fsum. It looks like the last method get the right results. but the first and second method get different values.
import pandas as pd
import math
def main():
data = [61.1, 19.3, 15.7, 3.07, .255, .158, .102, .072, .0608,
.0048, .0416, .0368, .0288, .0128, .0112, .0096, .0096,
.008, .0048, .004, .004, .0032, .0024, .0006]
df = pd.DataFrame(data,columns=['value'])
print(df['value'].sum())
sum = 0
for x in data:
sum += x
print(sum)
print(math.fsum(data))
main()
Current results:
99.99999999999999
100.00000000000004
100.0
Expected results:
100.0
100.0
100.0
All three functions give different result due to them having different algorithm for floating points sumation. The reason why for the difference is that floating point representation in computer program have limited precision and different algorithm is needed depending on the trade offs between performance and precision. You can learn more about the implementation of floating points operations in this answer
sum += xapproach use regular approach of adding up the approximation value represented by the program. This approach is the most computationally efficient but also the most inaccurate.df[value].sum()use np.sum() under the hood which uses pairwise summation algorithm . This approach is more accurate than regular sum operation but slower runtimemath.fsumis the most accurate algorithm for summation of floating points but also the most inefficient. You can learn more about its implementation here