How to improve Random Forest regression model based on pattern with actual vs. predicted pattern

16 Views Asked by At

I'm fairly new with ML and could use your collective advice here. I have a regression problem (outcome is price) and I am using a random forest estimator model, based on sklearn.ensemble.RandomForestRegressor.

The results I'm getting are shown in the scatter plot here:

Scatter Plot Actual vs. Predicted

In the training dataset (blue), the model is not over fit and performs ok. In the test dataset (orange), the model seems to overestimate the low actual values and underestimate the high actual values.

I'm trying to interpret this behavior and extract some ideas on how to tune the model or adjust the pre-processing (outliers, nulls etc).

Given that I'm fairly new to this, this may be a naive or too broad question but any suggestions are appreciated.

Thank you!

0

There are 0 best solutions below