ML LGOct 1, 2018

On Theory for BART

arXiv:1810.00787v211.259 citations

Originality Incremental advance

AI Analysis

This work addresses a theoretical gap for practitioners using BART, though it is incremental as it builds on prior foundational studies.

The paper tackled the lack of theoretical guarantees for the BART method by analyzing its exact prior and proposing a modification to achieve optimal posterior convergence, concluding with a result on optimal convergence rates.

Ensemble learning is a statistical paradigm built on the premise that many weak learners can perform exceptionally well when deployed collectively. The BART method of Chipman et al. (2010) is a prominent example of Bayesian ensemble learning, where each learner is a tree. Due to its impressive performance, BART has received a lot of attention from practitioners. Despite its wide popularity, however, theoretical studies of BART have begun emerging only very recently. Laying the foundations for the theoretical analysis of Bayesian forests, Rockova and van der Pas (2017) showed optimal posterior concentration under conditionally uniform tree priors. These priors deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that it also enjoys optimality properties. To this end, we dive into branching process theory. We obtain tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes exploiting their connection to random walks. We conclude with a result stating the optimal rate of posterior convergence for BART.

View on arXiv PDF

Similar