I am currently a bit lost with back transforming log with random forests. I started to use log for the dependent variable because it is skewed.
I made a simple bagging random forest and a "tuned" forest which works better on test data than the bagging model (RMSE of 0.2058421 for the bagging model and RMSE of 0.2004765 for the random forest), but when I transform
them back to normal using
sqrt(mean((exp(cars_test$Price) - exp(bag_prediction))^2))
sqrt(mean((exp(cars_test$Price) - exp(rf_prediction))^2))
I get an RMSE of 5050.66 with the bagging model vs an RMSE of 5178.076 with the random forest.
This is where I am lost... Why is the bagging model better all of a sudden? Is there something I should change in the way I am back transforming the log price or is this just how it works?
I have looked at a few other stackoverflow questions about back transforming log, where some recommended using the way I described above. However non of them mentioned anything about the better model coming out worse after transforming back?