Cubic model coefficient drastically different that what it actually is but provided correct prediction

48 Views Asked by At

I tried to fit a cubic model and use it for prediction in R, while the predictions provided are accurate, the coefficient of the model doesn't make any sense given the predicted values.

Below is the output when I run summary(lm(Blur ~ poly(logMAR, 3), data = df)):

Call:
lm(formula = Blur ~ poly(logMAR, 3), data = df)

Residuals:
     Min       1Q   Median       3Q      Max 
-20.7838  -2.0661  -0.0868   2.3009  23.8918 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)       35.0000     0.7775  45.015   <2e-16 ***
poly(logMAR, 3)1 230.6529     6.7782  34.029   <2e-16 ***
poly(logMAR, 3)2  94.3878     6.7782  13.925   <2e-16 ***
poly(logMAR, 3)3  19.6777     6.7782   2.903   0.0049 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.778 on 72 degrees of freedom
Multiple R-squared:  0.9497,    Adjusted R-squared:  0.9476 
F-statistic: 453.4 on 3 and 72 DF,  p-value: < 2.2e-16

Subsequently, when calling coef(model), it returned:

     (Intercept) poly(logMAR, 3)1 poly(logMAR, 3)2 poly(logMAR, 3)3 
        35.00000        230.65294         94.38777         19.67767 

Which is accurate to the summary, but when I call for predict(model, newdata = data.frame(logMAR = c(0.37, 0.74, 1.15)) it returned:

        1         2         3 
 3.492585 11.469869 31.099493 

However, if I calculate the prediction manually according to the coefficients, the answers would be:

[1] 134.2600 265.3438 455.0060

After various comparisons with plots, the smaller output makes way more sense, and the coefficients from R's summary seem to be too large, making the manual calculation inaccurate. I sanity-checked the model using alternative software and online tools, which returned smaller coefficients of β0 = 1.3415, β1 = 0.9073, β2 = 9.2517 and β3 = 10.8354.

Manual calculation with the smaller coefficients aligns with the predict() function in R.

        1         2         3 
 3.492585 11.469869 31.099493 

Is there a reason why the coefficients are that big in R's summary(), am I doing the manual calculation incorrectly or looking at the wrong place?

1

There are 1 best solutions below

0
thothal On

As said in the comments, poly uses orthogonal polynomials by default and not simply a0 + a1 * x + a2 * x ^ 2 + a3 * x ^ 3. You can, however, ask for raw polynomials.

The following toy example illustrates the issue:

set.seed(123)
d <- data.frame(x = runif(100))
d$y <- (d$x + 4) ^ 3 + rnorm(100, sd = .2)

mo <- lm(y ~ poly(x, degree = 3), data = d)
mr <- lm(y ~ poly(x, degree = 3, raw = TRUE), data = d)

bd <- data.frame(x = 2)
predict(mo, bd)
#        1 
# 220.7312
predict(mr, bd) ## same
#        1 
# 220.7312

sum(coef(mr) * 2 ^ (0:3)) ## raw polys can be interpreted like this
# [1] 220.7312
sum(coef(mo) * 2 ^ (0:3)) ## however, orthogonal ones not --> wrong
# [1] 480.2738
sum(coef(mo) * c(1, predict(poly(d$x, degree = 3), 2))) ## but you can use them this way
# [1] 220.7312