Ensemble Methods - Bagging¶

Bagging (or bootstrap aggregration) describes a class of ensemble methods where a collection of base-classifiers are each trained on a different subset of the original training data. At classificition time, each of the base classifiers then "votes" on the final class label. In the examples below we will explore the use of bagging to train an ensemble of decision tree regressors. In the case of regression, the ensemble prediction is just the mean of the predictions made by the component trees.

Bagging and Variance¶

First, let's look at what happens when we train a single decision tree on the full training set...

In [5]:
import numpy as np
import matplotlib.pyplot as plt
import datasource
import fit_experiments

#plt.rcParams["figure.figsize"] = (7.5,5.5)

ds = datasource.DataSource(seed=200)
x, y = ds.gen_data(80)

model1 = fit_experiments.BaggingTreeRegression(num_trees=1,
                                               max_leaf_nodes=10000000)

model1.fit(x,y)
model1.plot(x, y)
No description has been provided for this image

This looks like overfitting. We could try tuning the hyperparameters of the tree, but instead, let's try training 30 decision trees, each of which will see a different draw from from our training set:

In [6]:
num_trees2 = 30

model2 = fit_experiments.BaggingTreeRegression(num_trees=num_trees2,
                                               max_leaf_nodes=100000)
model2.fit(x,y)
model2.plot(x, y)
No description has been provided for this image

As you can see, each individual training point has less of an impact on the final prediction because not all trees are trained using every point. It looks like the ensemble result is doing a better job of capturing the trends in the training data.

We can quantify this by running some experiments to estimate the bias and variance of a single decision tree vs. a bagged ensemble:

In [7]:
num_trees1 = 1
max_leaf_nodes1 = 1000000

num_trees2 = 30
max_leaf_nodes2 = 1000000


plt.subplot(121)
print("Single Tree:")
fit_experiments.bias_variance_experiment(num_trials=100, train_size=80, 
                                        max_leaf_nodes=max_leaf_nodes1,
                                        num_trees=num_trees1,
                                        source=ds)
plt.title('trees: {}'.format(num_trees1))

plt.subplot(122)
print("\n{} Tree Ensemble:".format(num_trees2))
fit_experiments.bias_variance_experiment(num_trials=100, train_size=80, 
                                        max_leaf_nodes=max_leaf_nodes2,
                                        num_trees=num_trees2,
                                        source=ds)
plt.title('trees: {}'.format(num_trees2))
plt.show()
Single Tree:
mean squared bias: 0.0006
mean variance: 0.0726

30 Tree Ensemble:
mean squared bias: 0.0005
mean variance: 0.0367
No description has been provided for this image

Notice that bagging significantly decreases the variance, while not impacting the already-low bias. This suggests that bagging should lead to overall lower generalization error.