EXCEL LinEst function returns nonsense when Const=False and numbers <1E-7

113 Views Asked by At

To reproduce my problem one can simply type in A1: 9.1E-8 and in A2: 9.2E-8 and then mark both cells and drag down (autofill) to A25. Then type in B1: "=1.1*A1" and copy the formula down, so we have 2 columns with 25 numbers each.

enter image description here

Now try the formula =LINEST(A1:A25,B1:B25,FALSE,TRUE) The result looks like this:

enter image description here

When you change now const=TRUE , the formula works quite as it should do giving me a slope of 0.909 (=1/1.1).

As a workaround I can also just multiply my numbers all with e.g. 1E9. That works.

Any other ideas?

Imho EXCEL should not behave like this. Already for years technology works in the scale of nm, nA, ns, nC, nF so I would expect that a modern product can handle 91n* as Float correctly.

3

There are 3 best solutions below

0
jkpieterse On

The Sum of Squares method doesn't work well with numbers close to zero. IN fact it also works quite bad when numbers differ by a (very) small fraction. I advise to "normalize" (or scale) your data by a formula like this: =(A2-AVERAGE(A$2:A$26))/AVERAGE(A$2:A$26) and then compute the LINEST. You should see it now produces plausible results. You'll have to convert the constants it gives back to the actual values though.

0
DataBunny On

What you see is not an Excel error per se. It goes to the root of how linear regression is calculated. For a excellent, very extensive and quite old discussion, see

https://stats.stackexchange.com/questions/26176/removal-of-statistically-significant-intercept-term-increases-r2-in-linear-mo

To summarize: linear regression looks for best fit of all y = a + bx. If x is source variable, y is target variable, then a (intercept) and b (coefficient) is what you want to calculate so that the R2 (residual, or sum of all unexplained differences squared) will be as small as possible.

Now what happens if you FORCE intercept a to ZERO ?

Well, to summarize the above in a short and plain way, your fit quality will probably be reported as better, but only because your R2 will no longer be calculated as it is supposed to be. Оnce it uses zero (0.0000) for a, R2 loses all its fundamental meaning and essentially becomes similar to white noise.

In many cases, there is a fundamental conflict between regression logic and material nature of data. I deal with this a lot because my subject is frequently money. Suppose we have a sum of investment in a company and a company valuation (like IPO). Under no circumstances can a zero valuation produce a non-zero investment. So forcing the intercept breaks a regression, and not forcing an intercept breaks financial logic!

See another extensive discussion here:

https://stats.stackexchange.com/questions/102709/when-forcing-intercept-of-0-in-linear-regression-is-acceptable-advisable

Thus said, Excel in past did not handle this issue well, but as of my experience, in Excel 365 it is already consistent and correct - as much as it can be given the underlying math and logic. See another long read on evolution of LINEST() in Excels since Office 2000.

https://www.informit.com/articles/article.aspx?p=2019170

0
Martin Brown On

As @jkpieterse has suggested LINEST is simple minded, makes no attempt to ensure numerical stability and so is very poor at fitting a linear approximation to data that has a significant x offset away from the origin. People usually trip up over this gotcha when using dates on the X axis.

The much better polynomial fit inside the XY charting software however is well able to get it right! MS did at one point break their quite decent charting polynomial fit to make it "agree" with the pathetic attempt that LINEST() provides in XL2007 (original out of the box edition). That was quickly changed back again.

Below is the result of using Excel charts polynomial fit for a linear equation (beware that it too can also go a bit haywire for anything more challenging than a cubic). Excel linear fit with origin included from Chart

The only thing you need to remember is to display enough significant digits on the fitted polynomial to reproduce the model results in real applications. Format trendline label allows you to do this.

LINEST() has been wrong and pretty useless ever since Excel was launched. It is unclear why they refuse to fix it when it is so obviously and easily broken. Perhaps nobody ever complains about it...