I am currently working on Optuna library and I have seen that there is a parameter which allows to prune unpromissing trials. It seems that this parameter can only be used with incremental learning method such as SGD classifier or with neural network. Hence, I was wondering, is it possible to use prune trials when using a random forest, a decision dree CART or even a logistic regression ?
Thanks a lot ! :)
PS : I did not find any example on the internet which used random forest with pruned trials using optuna ...
SGDClassifierwithloss='cross_entropy'performs logistic regression, enabling you to use incremental learning for logistic regression.As for random forests and decsion trees; they are batch learners, so trial pruning doesn't apply. However, you can wrap batch learners in a class (
PseudoIncrementalBatchLearnerbelow) that refits the learner on more and more data each time you callpartial_fit(). This is similar to how a learning curve is generated, where the estimator is refitted on increasing portions of the dataset.In general, as a learner is fit on larger portions of the data, its generalisation error will go down - the trend is somewhat predictable for an individual estimator. However, when comparing learners, you might like to prune ones that are relatively slow to improve, and which are too expensive to train on the entire dataset...this is where
PsueoIncrementalBatchLearnerbelow could be useful.The data + example below shows how the blue random forest is slow to improve compared to the orange one, and therefore the blue one is a candidate for early pruning. This avoids you having to train the learner on the full dataset (although at the end they're comparable).