🌑

Stephen's Blog

What Is Bagging and How Does It Work?

 

Stephen Cheng

Intro

Bagging is a technique used to reduce the variance of our predictions by combining the result of multiple classifiers modeled on different sub-samples of the same dataset. The following figure will make it clearer.

Steps

The steps followed in bagging are:

1) Create Multiple DataSets

  • Sampling is done with replacement on the original data and new datasets are formed.
  • The new data sets can have a fraction of the columns as well as rows, which are generally hyper-parameters in a bagging model.
  • Taking row and column fractions less than 1 helps in making robust models, less prone to overfitting.

2) Build Multiple Classifiers

  • Classifiers are built on each data set.
  • Generally the same classifier is modeled on each dataset and predictions are made.

3) Combine Classifiers

  • The predictions of all the classifiers are combined using a mean, median or mode value depending on the problem at hand.
  • The combined values are generally more robust than a single model.

Note that, here the number of models built is not a hyper-parameters. Higher number of models are always better or may give similar performance than lower numbers. It can be theoretically shown that the variance of the combined predictions are reduced to 1/n (n: number of classifiers) of the original variance, under some assumptions.

, — Nov 21, 2018

Search

    Made with ❤️ and ☀️ on Earth.