Python natural spline function cr in patsy only accepts 3 or more degrees of freedom, whereas ns in R accepts 2

357 Views Asked by At

I am trying to port this functionality into python

> x <- 0:10
> y <- x**2
> lm(y ~ ns(x,df=2))

Such as:

import numpy as np
import pandas as pd
import statsmodels.formula.api as smf

x = pd.DataFrame(np.arange(11))
y = x**2
formula="y ~ cr(x, df = 3)"

reg = smf.ols(formula,data=x).fit()
print(res.summary())

However with this python formulation, I cannot set df<3. Any suggestions how I can have a natural spline in python with two degrees of freedom, and use it in patsy as an R style equation?

1

There are 1 best solutions below

0
Ben Bolker On

These are clearly generating different bases: I'm not sure what the difference is, but the exploration below might help.

Note that cr mimics the basis construction from mgcv (see here; in addition to Simon Wood's book they are also discussed here), while ns() is a natural spline built on a B-spline basis. I believe that splines::bs() and patsy.bs would match perfectly, but there is no patsy.ns.

x <- 0:10
X1 <- model.matrix(~splines::ns(x, df = 3))
matplot(x, X1, type = "l")

enter image description here

import numpy as np
import pandas as pd
import patsy
import matplotlib.pyplot as plt
x = np.arange(11)
X2 = patsy.dmatrix(
        'cr(x, df = 3)',
        {'x': x}, return_type='dataframe')
plt.plot(X2)

enter image description here