Confidence intervals for the random forest generalization error
This provides a computationally efficient way to assess uncertainty in random forest predictions, which is incremental as it builds on existing out-of-bag estimates.
The authors tackled the problem of estimating confidence intervals for random forest generalization error without data splitting or retraining, showing that their method yields good coverage and shrinking width with low computational cost.
We show that the byproducts of the standard training process of a random forest yield not only the well known and almost computationally free out-of-bag point estimate of the model generalization error, but also give a direct path to compute confidence intervals for the generalization error which avoids processes of data splitting and model retraining. Besides the low computational cost involved in their construction, these confidence intervals are shown through simulations to have good coverage and appropriate shrinking rate of their width in terms of the training sample size.