I am relatively new to python. I am trying to do a multivariate linear regression and plot scatter plots and the line of best fit using one feature at a time.
This is my code:
Train=df.loc[:650]
valid=df.loc[651:]
x_train=Train[['lag_7','rolling_mean', 'expanding_mean']].dropna()
y_train=Train['sales'].dropna()
y_train=y_train.loc[7:]
x_test=valid[['lag_7','rolling_mean', 'expanding_mean']].dropna()
y_test=valid['sales'].dropna()
regr=linear_model.LinearRegression()
regr.fit(x_train,y_train)
y_pred=regr.predict(x_test)
plt.scatter(x_test['lag_7'], y_pred,color='black')
plt.plot(x_test['lag_7'],y_pred, color='blue', linewidth=3)
plt.show()
And this is the graph that I'm getting-
I have tried searching a lot but to no avail. I wanted to understand why this is not showing a line of best-fit and why instead it is connecting all the points on the scatter plot.
Thank you!

Assuming your graphical library is matplotlib, imported with
import matplotlib.pyplot as plt, the problem is that you passed the same data to bothplt.scatterandplt.plot. The former draws the scatter plot, while the latter passes a line through all points in the order given (it first draws a straight line between(x_test['lag_7'][0], y_pred[0])and(x_test['lag_7'][1], y_pred[1]), then one between(x_test['lag_7'][1], y_pred[1])and(x_test['lag_7'][2], y_pred[2]), etc.)Concerning the more general question about how to do multivariate regression and plot the results, I have two remarks:
Finding the line of best fit one feature at a time amounts to performing 1D regression on that feature: it is an altogether different model from the multivariate linear regression you want to perform.
I don't think it makes much sense to split your data into train and test samples, because linear regression is a very simple model with little risk of overfitting. In the following, I consider the whole data set
df.I like to use OpenTURNS because it has built-in linear regression viewing facilities. The downside is that to use it, we need to convert your pandas tables (
DataFrameorSeries) to OpenTURNS objects of the classSample.You did not provide your data, so I need to generate some:
Now, let us find the best-fitting line one feature at a time (1D linear regression):
As you can see, in this example, none of the one-feature linear regressions are able to very accurately predict the output.
Now let us do multivariate linear regression. To plot the result, it is best to view the actual vs. predicted values.
As you can see, in this example, the fit is much better with multivariate linear regression than with 1D regressions one feature at a time.