Cost-complexity pruning of random forests
This is an incremental improvement for machine learning practitioners seeking more efficient random forest models.
The paper tackled the problem of reducing random forest size without significant accuracy loss by using out-of-bag samples for post-pruning, showing consistent decreases in forest size on four UCI datasets.
Random forests perform bootstrap-aggregation by sampling the training samples with replacement. This enables the evaluation of out-of-bag error which serves as a internal cross-validation mechanism. Our motivation lies in using the unsampled training samples to improve each decision tree in the ensemble. We study the effect of using the out-of-bag samples to improve the generalization error first of the decision trees and second the random forest by post-pruning. A preliminary empirical study on four UCI repository datasets show consistent decrease in the size of the forests without considerable loss in accuracy.