Linear regression with Scikit-Learn vs GraphLab Create. Same data, different results

288 Views Asked by At

I have created a multiple linear regression model on some data (housing prices for the Seattle county) with GraphLab Create and one with Scikit-Learn. Test and training set are chose at random but I've used the same split (80/20). However, the results are very different.

The mean error for the GraphLab model is 106254.49 while for the Scikit-Learn model it's 168980.44

The code to create the GraphLab model is from an online course, so I assume it's correct. The one I wrote for the Scikit model is:

model = LinearRegression().fit(train_features,train_target)
test_predictions = model.predict(test_features)
errors = abs(test_predictions - test_target)

I understand that the data for the two models is not exactly the same because both samples were chosen at random, but with a training set size of about 17k rows and a test set size of about 4k rows I wouldn't expect a big difference.

Any suggestions? Am I doing something wrong with the Scikit linear regression?

In essence I would like to be able to replicate the GraphLab model using Scikit, expecting very similar performances.

Thanks

0

There are 0 best solutions below