Ensemble Methods - Bagging¶
Bagging (or bootstrap aggregration) describes a class of ensemble methods where a collection of base-classifiers are each trained on a different subset of the original training data. At classificition time, each of the base classifiers then "votes" on the final class label. In the examples below we will explore the use of bagging to train an ensemble of decision tree regressors. In the case of regression, the ensemble prediction is just the mean of the predictions made by the component trees.
Bagging and Variance¶
First, let's look at what happens when we train a single decision tree on the full training set...
import numpy as np
import matplotlib.pyplot as plt
import datasource
import fit_experiments
#plt.rcParams["figure.figsize"] = (7.5,5.5)
ds = datasource.DataSource(seed=200)
x, y = ds.gen_data(80)
model1 = fit_experiments.BaggingTreeRegression(num_trees=1,
max_leaf_nodes=10000000)
model1.fit(x,y)
model1.plot(x, y)
This looks like overfitting. We could try tuning the hyperparameters of the tree, but instead, let's try training 30 decision trees, each of which will see a different draw from from our training set:
num_trees2 = 30
model2 = fit_experiments.BaggingTreeRegression(num_trees=num_trees2,
max_leaf_nodes=100000)
model2.fit(x,y)
model2.plot(x, y)
As you can see, each individual training point has less of an impact on the final prediction because not all trees are trained using every point. It looks like the ensemble result is doing a better job of capturing the trends in the training data.
We can quantify this by running some experiments to estimate the bias and variance of a single decision tree vs. a bagged ensemble:
num_trees1 = 1
max_leaf_nodes1 = 1000000
num_trees2 = 30
max_leaf_nodes2 = 1000000
plt.subplot(121)
print("Single Tree:")
fit_experiments.bias_variance_experiment(num_trials=100, train_size=80,
max_leaf_nodes=max_leaf_nodes1,
num_trees=num_trees1,
source=ds)
plt.title('trees: {}'.format(num_trees1))
plt.subplot(122)
print("\n{} Tree Ensemble:".format(num_trees2))
fit_experiments.bias_variance_experiment(num_trials=100, train_size=80,
max_leaf_nodes=max_leaf_nodes2,
num_trees=num_trees2,
source=ds)
plt.title('trees: {}'.format(num_trees2))
plt.show()
Single Tree: mean squared bias: 0.0006 mean variance: 0.0726 30 Tree Ensemble: mean squared bias: 0.0005 mean variance: 0.0367
Notice that bagging significantly decreases the variance, while not impacting the already-low bias. This suggests that bagging should lead to overall lower generalization error.