🌑

☀️

Stephen's Blog

Home Archives About

What Is Bagging and How Does It Work?

Stephen Cheng

Intro

Bagging is a technique used to reduce the variance of our predictions by combining the result of multiple classifiers modeled on different sub-samples of the same dataset. The following figure will make it clearer.

Steps

The steps followed in bagging are:

1) Create Multiple DataSets

Sampling is done with replacement on the original data and new datasets are formed.
The new data sets can have a fraction of the columns as well as rows, which are generally hyper-parameters in a bagging model.
Taking row and column fractions less than 1 helps in making robust models, less prone to overfitting.

2) Build Multiple Classifiers

Classifiers are built on each data set.
Generally the same classifier is modeled on each dataset and predictions are made.

3) Combine Classifiers

The predictions of all the classifiers are combined using a mean, median or mode value depending on the problem at hand.
The combined values are generally more robust than a single model.

Note that, here the number of models built is not a hyper-parameters. Higher number of models are always better or may give similar performance than lower numbers. It can be theoretically shown that the variance of the combined predictions are reduced to 1/n (n: number of classifiers) of the original variance, under some assumptions.

Bagging, Classifier — Nov 21, 2018

Search

Made with ❤️ and ☀️ on Earth.