How does the spacing of a column name effect its use in statsmodels regression?

19 Views Asked by At

I ran the following code. It appears that the spacing of the column name in my excel sheet seems to affect if I receive an error or not. But there does not seem to be any reasoning or cause. For example I ran the program below again replacing the column name "y" with "Interest Rate" then I received an error. Then I replaced it with "Interest_Rate" thinking that the column names needed to be one word but I still received an error. Then I changed the column name to "y" and it worked fine. DO I need to play around with the spacing of my column name in the excel sheet I'm importing to the dataFrame? Or is there something else wrong here? It seems to be a spacing issue but I just don't know why I cannot use "Interest Rate" or "Interest_Rate" instead of the name "y" for my dependent variable column or how to fix the spacing issue:

#import the libraries
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df= pd.read_excel("C:/Users/ME/OneDrive/Desktop/simple regress.xlsx")
print(df)

ax1 = df.plot.scatter(x='Immigration',
                      y='y',
                      c='DarkBlue')
import statsmodels.formula.api as smf
#Next use the ols function to create a regression model using that data in #the dataframe #called df  and fit the data of the dataframe into the model
result = smf.ols(formula='y ~ Immigration', data=df).fit()

# Print the parameters/coefficients in the regression equation
print(result.params)

# print the regression analysis/results
print(result.summary())
0

There are 0 best solutions below