Giacomo Petrillo

h-index1
2papers

2 Papers

MLOct 30, 2024Code
Very fast Bayesian Additive Regression Trees on GPU

Giacomo Petrillo

Bayesian Additive Regression Trees (BART) is a nonparametric Bayesian regression technique based on an ensemble of decision trees. It is part of the toolbox of many statisticians. The overall statistical quality of the regression is typically higher than other generic alternatives, and it requires less manual tuning, making it a good default choice. However, it is a niche method compared to its natural competitor XGBoost, due to the longer running time, making sample sizes above 10,000-100,000 a nuisance. I present a GPU-enabled implementation of BART, faster by up to 200x relative to a single CPU core, making BART competitive in running time with XGBoost. This implementation is available in the Python package bartz.

MLOct 26, 2024
On the Gaussian process limit of Bayesian Additive Regression Trees

Giacomo Petrillo

Bayesian Additive Regression Trees (BART) is a nonparametric Bayesian regression technique of rising fame. It is a sum-of-decision-trees model, and is in some sense the Bayesian version of boosting. In the limit of infinite trees, it becomes equivalent to Gaussian process (GP) regression. This limit is known but has not yet led to any useful analysis or application. For the first time, I derive and compute the exact BART prior covariance function. With it I implement the infinite trees limit of BART as GP regression. Through empirical tests, I show that this limit is worse than standard BART in a fixed configuration, but also that tuning its hyperparameters in the natural GP way makes it competitive with BART. The advantage of using a GP surrogate of BART is the analytical likelihood, which simplifies model building and sidesteps the complex BART MCMC algorithm. More generally, this study opens new ways to understand and develop BART and GP regression. The implementation of BART as GP is available in the Python package lsqfitgp.