MLLGOct 30, 2024

Very fast Bayesian Additive Regression Trees on GPU

arXiv:2410.23244v21 citationsh-index: 1Has Code
Originality Incremental advance
AI Analysis

This work addresses a performance bottleneck for statisticians and data scientists using BART, enabling its application to larger datasets where it was previously impractical.

The paper tackles the slow running time of Bayesian Additive Regression Trees (BART) for large sample sizes by presenting a GPU-enabled implementation that is up to 200x faster than a single CPU core, making BART competitive with XGBoost in speed.

Bayesian Additive Regression Trees (BART) is a nonparametric Bayesian regression technique based on an ensemble of decision trees. It is part of the toolbox of many statisticians. The overall statistical quality of the regression is typically higher than other generic alternatives, and it requires less manual tuning, making it a good default choice. However, it is a niche method compared to its natural competitor XGBoost, due to the longer running time, making sample sizes above 10,000-100,000 a nuisance. I present a GPU-enabled implementation of BART, faster by up to 200x relative to a single CPU core, making BART competitive in running time with XGBoost. This implementation is available in the Python package bartz.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes