Fully Nonparametric Bayesian Additive Regression Trees
This addresses the issue of misleading inference in BART for statisticians and data scientists, though it is an incremental extension focused on error distribution flexibility.
The paper tackles the problem of BART's restrictive IID normal error assumption by extending it with a Dirichlet process mixture to model errors nonparametrically, resulting in a method that maintains performance under normality while adapting to non-normal errors without specifying new parameters.
Bayesian Additive Regression Trees (BART) is a fully Bayesian approach to modeling with ensembles of trees. BART can uncover complex regression functions with high dimensional regressors in a fairly automatic way and provide Bayesian quantification of the uncertainty through the posterior. However, BART assumes IID normal errors. This strong parametric assumption can lead to misleading inference and uncertainty quantification. In this paper, we use the classic Dirichlet process mixture (DPM) mechanism to nonparametrically model the error distribution. A key strength of BART is that default prior settings work reasonably well in a variety of problems. The challenge in extending BART is to choose the parameters of the DPM so that the strengths of the standard BART approach is not lost when the errors are close to normal, but the DPM has the ability to adapt to non-normal errors.