Why does AdaBoost or GradientBoosting ensemble with a single estimator give different values than the single estimator?

249 Views Asked by David R At 17 May 2022 at 22:05

I'm curious why a single-estimator Adaboost "ensemble", a single-estimator Gradient Boosted "ensemble" and a single decision tree give different values.

The code below compares three models, all using the same base estimator (regression tree with max_depth = 4 and loss based on mse.)

The base estimate as a bare tree model
A single-estimator Adaboost using the base estimator as a prototype
A single-estimator GBR using the base estimator as a prototype

Extracting and inspecting the trees indicate they are very different, even though each should have been trained in the same fashion.

from sklearn.datasets import load_diabetes
from sklearn.ensemble import AdaBoostRegressor, GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor, export_text

data = load_diabetes()
X = data['data']
y = data['target']

simple_model = DecisionTreeRegressor(max_depth=4)
prototype = DecisionTreeRegressor(max_depth=4)
simple_ada = AdaBoostRegressor(prototype, n_estimators=1)
simple_gbr = GradientBoostingRegressor(max_depth=4, n_estimators=1, criterion='mse')

simple_model.fit(X, y)
simple_ada.fit(X, y)
simple_gbr.fit(X, y)

ada_one = simple_ada.estimators_[0]
gbr_one = simple_gbr.estimators_[0][0]

print(export_text(simple_model))
print(export_text(ada_one))
print(export_text(gbr_one))

Original Q&A

There are 1 best solutions below

Ben Reiniger On 18 May 2022 at 16:48 BEST ANSWER

AdaBoostRegressor performs weighted bootstrap sampling for each of its trees (unlike AdaBoostClassifier which IIRC just fits the base classifier using sample weights): source. So there's no way to enforce that a single-tree AdaBoost regressor matches a single decision tree (without, I suppose, doing the bootstrap sampling manually and fitting the single decision tree).

GradientBoostingRegressor has an initial value for each sample to boost from:

init : estimator or ‘zero’, default=None
An estimator object that is used to compute the initial predictions. init has to provide fit and predict. If ‘zero’, the initial raw predictions are set to zero. By default a DummyEstimator is used, predicting either the average target value (for loss=’squared_error’), or a quantile for the other losses.

So the main difference between your tree and single-estimator-gbm is that the latter's leaf values are shifted by the average target value. Setting init='zero' gets us much closer, but I do see some differences in chosen splits further down the tree. That is due to ties in optimal split values, and can be fixed by setting a common random_state throughout.

Why does AdaBoost or GradientBoosting ensemble with a single estimator give different values than the single estimator?

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in SCIKIT-LEARN

Related Questions in DECISION-TREE

Related Questions in ENSEMBLE-LEARNING

Related Questions in ADABOOST

Trending Questions

Popular # Hahtags

Popular Questions