Boosting vs. Ensemble models

226 Views Asked by At

Can someone compare and contrast these two concepts in layman terms for me? The definitions sound similar but I know there has to be more differences between the two.

I have:

  • Ensemble models: Combines multiple ML models together to get a better model.
  • Boosting: Improving a single weak model by combining it with a number of other weak models in order to generate a collectively strong model.
2

There are 2 best solutions below

3
duffymo On

Ensemble is a weighted combination of several models that returns a single result. The weights can be thought of as a measure of your confidence in each model relative to the others.

I thought boosting meant an iterative approach: residual errors from prior model were fed into subsequent model to reduce them further. I think of the errors as a new input step to an iterative process that drives the errors closer to zero.

0
CutePoison On

To elaborate on @duffymo;

Ensemble simply means "collection" so it just a collection of different models (or the same) - think of Random Forest. It is a collection of (different) Decision Trees where we then average the outputs from them to create 1 "meta" model.

I would say that boosting is an ensemble, but created in a specific way. Different boosting algorithms do it differently but what they have in common is, that they use the errors from the previous model, to create a better model in the next step. One way of creating a boosting algorithm would be:

  1. Fit some baseline model,m_0 (regression could be the mean of y_train)
  2. Calculate the error/residuals,e, for y_train using the model M = m_0
  3. Fit a model (that could be a Linear Regression) m_1 to predict e
  4. Create a new model as M = m_0+m_1
  5. Repeat (2)-(4) as many times you want, such that your model is M=m_0+m_1+m_2...

Why does this work?

Since the error e is defined as e = y_train-m_0(x) (where m_0(x) is the predictions using m_0) then we can train a model, m_1 to predict e i.e we can approximate e by m_1(x) thus we then get

m_1(x)=y_train-m_0(x) which then implies y_train = m_1(x)+m_0(x) (our model in step (4)). That model is not perfect thus we can iterate over it again and again by adding a new model that fits the residual of the previous M.

Some algorithms, like XGBoost would add a "learning rate",alpha to each of the models, such that M = m_0 + alpha*m_1+alpha*m_2...

but that's another story